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Transcriptional Regulators and Methods Thereof 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims the benefit of the filing date of U.S. Application 
No. 60/525318, filed November 26, 2003, entitled "CONTROL OF PANCREAS AND 
LIVER GENE EXPRESSION BY HNF TRANSCRIPTION FACTORS", U.S. 
Application No. 60/542520, filed Febraary 6, 2004, entitled "CONTROL OF 
PANCREAS AND LIVER GENE EXPRESSION BY HNF TRANSCRIPTION 
FACTORS", U.S. AppHcationNo. 60/544835, filed February 13, 2004, entitled 
"CONTROL OF PANCREAS AND LIVER GENE EXPRESSION BY HNF 
TRANSCRIPTION FACTORS", and U.S. Application No. 60/547933, filed February 
26, 2004, entitled "TRANSCRIPTIONAL REGULATORS AND METHODS 
THEREOF". The entire teachings of the referenced applications are incorporated by 
reference herein. 

FUNDING 

The invention described herein was supported, in whole or in part, by the U.S. 
Department of Energy Program for Computational Molecular Biology. The United 
States government has certain rights in the invention. 

BACKGROUND OF THE INVENTION 

Gene expression is controlled by transcriptional regulatory proteins, which bind 
^ecific DNA sequences and recruit cofactors and the transcription apparatus to 
promoters (7-5). The expression of transcriptional regulators themselves is also 
regulated by transcriptional regulators, and a single gene may be regulated by multiple 
transcription factors. As a result of these regulatory networks, or pathways, 
misregulation of a single transcriptional regulator in a cell can result in the aberrant 
expression of multiple genes in the network in which the transcriptional regulator is 
active, leading to disease in the organism. 

Qirrent methods of identifying the genes controlled by a transcriptional 
regulator typically include a comparison of the niRNA levels of candidate target in 
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cells which express the transcriptional regulator and control cells which either do not 
express it. Often, this involves overexpressing a recombinant transcriptional regulator 
in a given cell type and using, as a control cell, one which overexpresses a control 
recombinant protein or no recombinant protein at all. However, given to tbie artificial 
nature of using cell lines and overexpressing transgenes, the results obtained from such 
approaches may not reflect the in vivo regulation by native transcriptional regiilators in 
an organism. 

Genome-wide analysis methods have been used recently to determine how 
tagged transcriptional regulators encoded in Saccharomyces cerevisae are associated 
with the genome in hving yeast cells and to model the transcriptional regulatory 
circuitry of these cells (4). These methods have also been used in human tissue culture 
cells to identify target genes for several transcriptional regulators (5-7). 

However, the need remains to develop genome-scale analysis methods to 
determine how transcriptional regulators control the global gene expression programs 
that characterize specific tissues, and in particular, freshly isolated, primary tissues, in 
which the transcriptional regulators are likely to maintain their in vivo specificities. 
Furthermore, there is a need to identify the regulatory networks or pathways in which a 
given transcriptional activator acts, in part, to allow for the identification of therapeutic 
targets for diseases caused by aberrant function of a transcriptional regulator. 

SUMMARY OF THE INVENTION 

In one aspect, the invention provides a method of identifying the genes 
regulated by a transcriptional regulator. One aspect of the invention provides a method 
of determining which genes from a subset of genes are regulated by a transcriptional 
regulator in a cell, the method comprising (a) selectively isolating chromatin from a 
cell which expresses the transcriptional regulator to generate isolated chromatin; (b) 
selectively isolating chromatin fragments from the isolated chromatin to generate 
bound chromatin fragments, wherein the bound chromatin fia.gments are bound by the 
transcriptional regulator; (c) amplifying both the boimd chromatin fragments to 
generate amplified chromatin fragments and the isolated chromatin to generate 
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amplified control chromatin; (d) hybridizing the amplified control chromatin and the ' 
amplified chromatin fi-agments to a DNA miCToarray, wherein the DNA microarray 
comprises (1) at least 10,000 experimental spots, each experimental spot comprising an 
experimental DNA, each experimental DNA comprising a promoter region from a gene 
5 in the subset; and (2) at least 100 control spots, each control spot comprising a control 
DNA, each control DNA comprising a non-promoter region; and (e) determining and 
comparing a hybridization signal at each of the spots on the microarray between those 
generated by (1) the amplified control chromatin; and (2) the amplified chromatin 
fragments; wherein a gene in the subset is said to be regulated by the transcriptional 
10 regulator in the cell if a spot comprising a promoter region of said gene displays a 
higher level of hybridization by the amplified chromatin fragments than by the 
amplified control chromatin. 

In another aspect, the invention provides methods of identifying regulatory 
15 networks, or pathways, in a cell. The invention provides a method of identifying a 
transcriptional regulatory network in a cell, the method comprising determining if a 
transcriptional regulator regulates additional transcriptional regulators in the cell using 
the method of any of the methods described herein, wherein a transcriptional 
regulatory network is identified if at least one additional transcriptional regulator is 
20 regulated by the transcriptional regulator. 

The invention also provides a method of identifying a transcriptional regulatory 
network in a cell, the method comprising determining if a transcriptional regulator 
regulates (i) its own promoter; or (ii) a promoter from a plurality of transcriptional 
25 regulators; using any of the methods described herein, wherein the experimental DNA 
comprises (a) a promoter from the transcriptional regulator; and (b) promoters from 
the plurality of transcriptional regulators; 

wherein a transcriptional regulatory network is identified if the transcriptional regulator 
regulates itself or if it regulates at least one of the plurahty of transcriptional regulators. 

30 

The invention further provides a method of identifying transcriptional 
regulatory networks in a cell, the method comprising (a) determining, by repeating a 



wo 2005/054461 



PCT/US2004/039805 



method of identifying the targets of transcriptional regulator for each of a plurality of 
transcriptional regulators, the genes in a subset which are regulated by each of the 
plurality of transcriptional regulators, wherein the ejqjerimental DNA comprises 
promoter regions for each of the plurality of transcriptional regulators; (b) determining 
5 if any one of the plurality of transcriptional regulators are regulated by at least one of 
the plurality of transcriptional regulators; wherein a transcriptional regulatory network 
is identified if any one of the pluraUty of transcriptional regulators is regulated by at 
least one of the plurality of transcriptional regulators. 

1 0 The invention also provides a DNA microarray for determining promoter 

occupancy in a human cell, the microarray comprising (1) aMeast 10,000 experimental 
spots, each experimental spot comprising an experimental DNA, each experimental 
DNA comprising a promoter region from a hiunan gene in the subset; and (2) at least 
100 control spots, each control spot comprising a control DNA, each control DNA 

1 5 comprising a non-promoter region; wherein at least 75% of the promoter regions 
comprise J&om at least 700bp upstream to at least 200 bp downstream of the 
transcriptional start site. 

Another aspect of the invention provides a method of estimating if a 
20 transcriptional regulator is a global transcriptional regulator, the method comprising (a) 
selectively isolating chromatin from a tissue; (b) identifying promoter regions from the 
chromatin which are bound by a candidate global transcriptional regulator; 
(c) identifying promoter regions from the chromatin which are bound by a member of 
the basal transcriptional machinery; and (d) comparing the promoter regions identified 
25 in steps (b) and (c) to determine the ratio between (i) the number of promoter regions 
bound by both the candidate global transcriptional regulator and the member of the 
basal transcriptional machinery; and (ii) the number of promoter regions boimd by the 
member of the basal transcriptional machinery, wherein a franscriptional regulator is a 
global transcriptional regulator when the ratio is greater than 0.2. 

30 

The invention further provides methods of identifying targets for therapeutics. 
In one aspect, the invention provides a method of identifying at least one target gene for 
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the development of a therapeutic to treat or prevent a disorder in a subject, wherein at 
least one form of the disorder is caused by an altered activity in a transcriptional 
regulator or in a suspected transcriptional regulator, the method comprising (a) 
identifying the genes regulated by the transcriptional regulator in a cell; (b) 
5 determining if the transcriptionai'regulator is a broad-acting'transcriptionai regulator or 
a narrow-acting transcriptional regulator, wherein if the transcriptional regulator is a 
broad acting transcriptional regulator then the transcriptional regulator is a target gene 
for the development of a therapeutic, and wherein if the transcriptional regulator is a 
narrow acting transcriptional regulator then (i) determining if at least one gene 

1 0 regulated by the transcriptional regulator is likely causative in the disorder, wherein a 
gene that is likely causative in the disorder is a target gene for the development of a 
thenpeutic; and (ii) reiterating steps (a) and (b) for at least one gene that is regulated by 
the transcriptional regulator in the cell and that either (1) encodes a transcriptional 
regulator or (2) is suspected to encode a transcriptional regulator, with the modification 

1 5 that the transcriptional regulator of steps (a) and (b) is said gene, thereby identifying at 
least one target gene for the development of a therapeutic to treat or prevent a disorder 
in the subject. 

The invention also provides methods of treating or preventing disease. In one 
20 aspect, the invention provides a method of treating or preventing type II diabetes in a 
subject, comprising administering to the subject a therapeutically effective amount of 
an agent that increases the global transcriptional activity of BDSIl*'4alpha. 

In another aspect, the invention provides a method of treating or preventing a 
25 disorder associated with low transcriptional activity of HN[F4alpha in a subject, 

comprising administering to the subject a therapeutically effective amount of an agent 
that increases the global transcriptional activity of HNF4alpha. A related aspect 
provides a method of treating or preventing a disorder associated with high 
transcriptional activity of HNF4alpha in a subject, comprising administering to the 
30 subject a therapeutically effective amount of an agent that decreases the global 
transcriptional activity of HNF4alpha. 
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The invention also provides a method of increasing the global transcriptional 
activity in a liver or a pancreatic cell comprising contacting the cell with an agent 
which increases the global transcriptional activity of HNF4alpha. A related aspect 
provides a method of decreasing the global transcriptiorial activity in a liver or a 

pancreatic cell-comprising contacting the-cell with an-agent-T(diich-^^^^ the global 

transcriptional activity of HNF4alpha. 

One aspect of the invention provides methods of regulating the expression level 
of genes. On aspect provides a method of regulating the expression level of any one of 
the genes in Figure 13 in a hepatocyte, the method comprising contacting the cell with 
an agent which regulates the transcriptional activity of HNFlalpha. A related aspect 
provides a method of regulating the ejqjression level of any one of the genes in Figure 
14 in a pancreatic cell, the method coniprising contacting the cell with an agent which 
regulates the transcriptional activity of HNFlalpha. 

Another aspect of the invention provides a method of regulating the expression 
level of any one of the genes in Figure 16 in a hepatocyte, the method comprising 
contacting the cell with an agent which regulates the transcriptional actixdty of HNF6. 
A related aspect provides a method of regulating the expression level of any one of the 
geaaes in Figure 17 in a pancreatic cell, the method comprising contacting the cell with 
an agent which regulates the transcriptional activity of HNF6. 

Yet another aspect of the invention provides a method of regulating the 
expression level of any one of the genes in Figure 1 8 in a hepatocyte, the method 
comprising contacting the cell with an agent which regulates the transcriptional activity 
of HNF4alpha. A related aspect provides a method of regulating the expression level 
of any one of the genes in Figure 1 9 in a pancreatic cell, the method comprising 
contacting the cell with an agent which regulated the transcriptional activity of 
HNF4alpha. 

The invention also provides methods for identifying transcriptionally active 
genes that are regulated by a transcriptional regulator in a ceU. In one aspect, the 
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invention provides a method of identifying transcriptionally active genes that are 
regulatedby a transcriptional regulator in a cell, the method comprising (a) selectively 
isolating chromatin from atissue; (b) identifying promoter regions from the chromatm 
thatarebonndbythetranscriptionalregulator;(c) identi^g promoter regxox^ from 
5 thechromatinthatare-bonndbyamember-oftheWtranscriptionalmachinery;a^^^^ 
(d) comparing the promoter regions identified in steps (b) and (c) to detemnne 
overlapping genes, wherein Ihe overlapping genes are transcriptionaUy active g^es 
regulated by the transcriptional regulator. 

1 0 BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1 A-IC show genome-scale location analysis of HNF regulators in hmnan 
tissues (A)Hepatocytesandpancreaticisletswereobtainedfromtissuedismbution 
programs. These cells were treated with formaldehyde to covalentiy link transcnption 
factors to DNA sites of interaction. Cells wer. harvested, and chromatin in cell lysates 
15 was sheared by sonication. The regulator-DNA complexes were enriched by chromatin 
HmnuBoprecipitation with specific antibodies, the crosslinks were reversed, and 
enriched DNA fragments and contiol genomic DNA fragments were amplified nsmg 
ligation-mediatedPCR. Ihe amplified DNA preparations, labeled wi1h distmct 
fluorophores,weremixedandhybridizedontoapromoterarray. (B) Vemi diagram 
20 showing the overlap of HNFla, HNF6, and HNF4a bound promoters in hepatocytes 
(top) and pancreatic islets (bottom). (C) The collection of genes occupied by RNA 
polymerase K in hepatocytes is displayed as a circle, with the genes bound by 
HNFla, HNF6. andHNF4a outlined collectively as a fraction of the chart. The 
relative'confributions of HNFla, HNF6, and HNF4a are shown as framing arcs. 

Figures 2A-2B show transcriptional regulatory networks and motifs. (A) HNFla, 
HNF6 and HNF4a are at the center of tissue-specific transcriptional regulatory 
xietworics. m these examples selected for illustration, regulatory proteins and their gene 
targets are represented as circles and boxes, respectively. SoM arrows indicate protem- 
30 DNA interactions, and genes encoding regulators are linked to their protein products by 
dashed lines. The HNF4a7 promoter, also known as the P2 promoter {24, 25), was 
recentiy impUcated as a major human diabetes susceptibiUty locus (see text). (B) 
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10 



Examples of regulatory network motife in hepatocytes. For instance, in the multi- 
component loop, HNFlaproteinbinds to the promoter of the HNF4a gene, andliie 
HNF4a protein binds to the promoter of the HNFla gene. These network motife were 
uncovered by searching binding data with various algorithms; for details on the 
algdritos"used~and afuU hsrof motifi foufidTsee (,20). 

Figure 3 shows one embodiment of a strategy for the identification of at least one 
target geue of a master regulator for the development of a therapeutic to treat or prevent 
a disorder. 

Figure 4 shows a Veim diagram showing the overlap of two single, independent ChIP 
experiments using hepatocytes with anti.HNF4^ antibodies sc-6556 and sc-8987. 

Figure 5 shows a Western blot of HNF4^ m HepG2 cells using 50 ^ig of ceU lysate 
15 protein with Ab sc-6556. The lower running band is approximately 50 kDa, which is 
the canonical molecular weight for HNF4a, and Ihe highCT ruming band is the 
appropriate location for HNF4a dimer. A very similar gel showing HNF4a antibody 
specificity for sc.6556 is available at the Santa Cruz website (www.scbt.com). 

20 Figures 6A-6D show scatterplots of attempted chromatin immunoprecipitations 

performed with the anti-HNF4a antibody sc-6556 using Jurkat (T-lymphocyte derived, , 
6A), BJ-T (foreskin fibroblast derived, 6B), andU937 (histocyte derived, 6C) cells. To 
demonstrate the noise inherent in the array analysis, applicants show a scatterplot of a 
sample of mput DNA, split, labeled with the two fluorophores, and hybridized to an 

25 array (6D). Identical control experiments performed using the anti-HNFla antibody sc- 
6547 afforded, essentially identical results. 

Figure 7 shows a scatterplot of a chromatin immuhoprecipitation performed wilii pre- 
inmiune commercial rabbit serum using hepatocytes (left). Goat pre-immune serum and 
30 two rabbit sera fiom different individuals gave a similar scatterplot. For comparison, 
applicants show the scatterplot for an equivalent ChIP with the anti-HNF4a antibody 
sc-6556 using hepatocytes (right). 
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Figure 8 shows a Venn diagram showing the overlap of the sets of promoters bound by 
HNF4a and KNA Pol n in hepatocytes and pancreatic islets. 

5 " Figure"9 "shows a composite gel o^^ 

reactions using anti-HNF4a antibody sc-6556 with crosshnked human hepatocytes. 

Figure 10 shows composite gel of gene-speci£c chromatin immunoprecipitation 
reactions using anti-HNFla antibody sc-6547 with crosslinked human hepatocj^es. 

10 

Figure 11 shows a partial list of proximal promoters occupied by of HNFla in human 
hepatoc5^es and pancreatic islets. These genes were assigned to functional categories 
using the program ProtoGo; genes not in this automated GO ontology database were 
assigned using Locuslinlc infonnation. Four genes are shown for each tissue/category 
15 combination; for some combinations, fewer than 4 promoters qualified as targets. 

Hypothetical and fimctionally imcharacterized genes are not shown. A complete list of 
targets is available in Figures 13 and 14. 

Figure 12 shows Occupancy of BJ-T and tissue-specific promoter sets by HNF factors. 
20 (*) Indicates that comparisons between BJ-T and primary tissues used only a subset of 
Hul 3K array promoters, as RNA Pol n was profiled ia BJ-T cells using a smaller, 
prototype array. The denominator in the above fractions represents the number of 
targets the HNF factor of interest occupied in the set of RNA Pol n occupied promoters 
that are either BJ-T specific or primary tissue specific. 

25 

Figure 13 shows HNFla bound promoters in hepatocytes 

Figure 14 shows HNFla bound promoters in pancreatic islets. 

30 Figures 15A-15D show genes previously suggested to be regulated by HNFl a and 
HNF4a. 'Direct' binding is in vivo ChIP and in vivo footpiinting, 'in vitro' binding is 
primarily gel mobility retardation assays and in vitro footprinting, and 'indirect' is 
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primarily transient transfections. 'Sequence-based' uses a number of different criteria 
to qualify binding. Note that some duplicate reports are omitted, as are a handful of 
recent large-scale screens, (e.g. Tronche 1997, Shih 2001, etc.). 

5 Figure 16 shows HNF6 bound promoters in hepatocytes. — ' • 

Figure 17 shows HD^6 bound promoters in pancreatic islets. 

Figure 18A-18C show HNF4a bound promoters in hepatocj^es. 

10 

Figures 19A-19C show HNF4a bound promoters in pancreatic islets. 

Figures 20A-20B show the feed forward regulatory motifs in hepatocytes . The 
regulatory modules here were derived as described in exemplification. Feed forwards 
15 only involving HNFla and HNF4a are also multi-input motife, as they biod each other's 
promoters in a multicomponent loop. 

Figures 21 A-21B show multi-input motifs in hq)atocytes. The regulatory modules here 
were derived as described in the exemplification. MIMs for the HNF6/HKF4^ and 
20 HNF 1 a/HNF4a are listed in Figure 20 as feedforward motifs. 

Figures 22A-22B show the feed forward regulatory motifs in pancreatic islets . The 
regulatory modules here were derived as described in Supporting Online Material. Feed 
forwards only involving HNFla and HNF4a are also multiinput motifs, as they bind 
25 each other's promoters in a multicomponent loop. 

Figures 23A-23B show multi-Input motife in pancreatic islets The regulatory modules 
here were derived as described in Supporting Online Material. MEMs for the 
HNF6/HNF4^ and HNFl a/HNF4a are listed in Figure 22 as feedforward. 

30 

Figures 24A-24B show transcriptional regulators occupied by HNFla and HNF4a. 
Network of DNA regulators downstream of HNFla and HNIF4a in hepatocytes and 
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islets. Target genes that are among the Gene Ontology "DNA-regulators" category 
were compiled, and are listed according to functional subcategory. 

DETAILED DESCRIPTION OF THE INVENTION 

5 I. "Overview " - - — - -. 

In certain aspects, the invention provides methods related to transcriptional 
regulators. Some aspects of the invention provide methods for the identification of 
genes whose transcription is regulated by a specific transcriptional regulator in a cell. 
Some of these methods comprise determining the promoter occupancy of the 

1 0 transcriptional regulator using a combination of chromatin immunoprecipitation and/or 
DNA micrbarray analysis of the. promoter regions that are physically associated with 
the transcriptional regulator in the cell. In some embodiments of the methods described 
herein, the DNA microarray comprises both experimental spots containing promoter 
DNA, and control spots containing non-promoter DNA. The methods described herein 

15 may be applied to any cell type, including transplant grade primary human tissue. 
Ftirthennore, the method described herein can be used to compare the function of 
transcriptional regulators across cell types, or across two populations, such as healthy 
and disease-afflicted subjects. 

20 In a related aspect, the invention provides methods of identifying regulatory 

networks, or pathways. Some methods comprise identifying the transcriptional 
regulators which are regulated by a given transcriptional regulator, and optionally, 
determining the genes that are regulated by those transcriptional regulators. Pathways 
that may be identified using the methods described herein include autoregulatory, 

25 multicomponent, feed-forward, and multi-components loops, as well as regulatory 
chains. 

The invention also provides methods of determining if a transcriptional 
regulator is a global transcriptional regulator. In some aspects, such methods comprise 
30 determining the promoter occupancy of both a transcriptional regulator and a member 
of the basal transcriptional machinery. Comparison of the promoter occupancy by the 
transcriptional regulator and by the member of the basal transcriptional machinery 
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allows the identification of transcriptionally active promoters that are bound and 
regulated by the transcription regulator. Other methods further comprise extrapolating 
fiom the set of promoters that were examined to the total number of promoters in the 
genome to determine the approximate number of transcriptionally active promoters in a 
■"cell:that aire"undeT the control of a specifictranscriptional factor or to determine if the 
transcriptional regulator is a global transcriptional regulator. 

Other aspects of the invention provide methods of identifying therapeutic 
targets to treat disease. One specific aspect of the invention relates to identifying at 
least one target gene for the development of a ther^eutic agent to treat or prevent a 
disorder in a subject, preferably a disorder in which at least one form of the disorder is 
caused by an altered activity in a transcriptional regulator or in a gene suspected to 
aacode a transcriptional regulator. Some of the methods provided herein to identify 
therapeutic targets comprise determining if a transcriptional regulator implicated in the 
disease is a broad-acting or a narrow-acting transcriptional regulator, such as by 
identifying at least a subset of the genes that it regulates in a cell, wherein broad-acting 
transcriptional regulators are targets for therapeutic agents. If the transcriptional 
regulator is narrow-acting, then the genes that it regulates may be examined further to 
determine if any are broad-acting transcriptional regulators (for those genes encoding 
transcriptional regulators) or if any of the genes are causative to the disease state i.e. 
they regulate a pathway or network that is impaired in the disease state. 

The invention further provides methods for the treatment of disease. Some 
aspects of the iuvention provide methods of treating metabolic disorders, such as type II 
diabetes. Specific aspects of the invention provide methods of treating or preventing 
type n diabetes in a subject by administering to the subject a therapeutically effective 
amoimt of an agent tbat increases the global transcriptional activity of HNF4a. 
Furthermore, the invention provides methods for modulating tiie expression level of 
genes. Such metihods are based, in part, on the finding by AppUcants of genes which 
are transcriptionally regulated by HNFlo; HNF4a or HNF6 in hepatocytes and 
pancreatic cells. In a related aspect, the invention provides methods of modulating and 
expression level of, and alleviating a disease state associated with the abnormal 
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expression of, flie genes in Figures 13-19 by modulating the transcriptional activity or 
expression of HNFl a, HNF4q; or HNF6. In specific embodiments, the expression of 
the genes is modulated in hepatocytes, pancreatic cells, or both. 

5 TT. Defjriitifi-ng ' ' ~ . . . 

For convenience, certain terms employed in the specification, examples, and 
appended claims, are collected here. Unless defined otherwise, all technical and 
scientific terms used herein have the same meaning as commonly understood by one of 
ordinary skill in the art to which this invention belongs. \ 
10 . 

The articles "a" and "an" are used herein to refer to one or to more than one 
(i.e., to at least one) of the grammatical object of the article. By way of example, "an 
element" means one element or more than one element. 

15 The term "including" is used herein to mean, and is used interchangeably with, 

the phrase "including but not limited" to. 

The term "or" is used herein to mean, and is used interchangeably with, the term 
"and/or," unless context clearly indicates otherwise. 

20 

The term "such as" is used herein to mean, and is used interchangeably, with the 
phrase "such as but not limited to". 

A "patient" or "subject" to be treated by the method of the invention can mean 
25 either a human or non-human animal, preferably a mammal. 

The temis "alpha" and "a" are used interchangeably, as are the terms "beta" and 

"yff". 

30 The term "encoding" comprises an RNA product resulting firom transcription of 

a DNA molecule, a protein resulting fiiom the translation of an RNA molecule, or a 
protein resulting fiom the transcription of a DNA molecule and the subsequent 
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translation of the RNA product. 

A "promoter" is a nucleic acid sequence that directs transcription of a nucleic 
acid. A promoter includes nucleic acid sequences near the start site of transcription, 
5 e.g., a TATA box, see,-e.g- Butler and Kadonaga (2002) Genes Dev. 16 :2583-2592;— 
Georgel (2002) Biochem. Cell Biol. 80:295-300. A promoter also optionally includes 
distal ei]hancer or repressor elements, which can be located as much as several 
thousand base pairs on either side from the start site of transcription. A "constitutive" 
promoter is a promoter that is active under most environmental and developmental 
1 0 conditions, while an "inducible", promoter is a promoter is active or activated under, 
e.g., specific environmental or developmental conditions. 

The term "expression" is used herein to mean the process by which a 
polypeptide is produced frorii DNA. The process involves the transcription of the gene 
1 5 into mRNA and the translation of this mRNA into a polypeptide. Depending on the 
context in which used, "expression" may refer to the production of KNA, protein or 
both. 

The term "recombinant" is used herein to mean any nucleic acid comprisuig 
20 sequences which are not adjacent in nature. A recombinant nucleic acid may be 

generated in viU-o, for example by using the methods of molecular biology, or in vivo, 
for example by insertion' of a nucleic acid at a novel chromosomal location by 
homologous or non-homologous recombination. 

25 The term "transcriptional regulator" refers to a biochemical element that acts to 

prevent or inhibit the transcription of a promoter-driven DNA sequence under certain " 
environmental conditions (e.g., a repressor or nuclear inhibitory protein), or to permit 
or stimulate the transcription of the promoter-driven DNA sequence under certain 
environmental conditions (e.g., an inducer or an enhancer). 

30 

The term "microarray" refers to an array of distinct polynucleotides or 
oligonucleotides synthesized on a substrate, such as paper, nylon or other type of 
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membrane, filter, chip, glass slide, or any other suitable solid support. 

The terms "disorders" and "diseases" are used inclusively and refer to any 
deviation firom the normal structure or function of any part, organ or system of the body 
(or any combination thereof); A specific- disease is manifested by characteristic 
symptoms and signs, including biological, chemical and physical changes, and is often 
associated with a variety of other factors including, but not limited to, demographic, 
environmental, employment, genetic and medically historical factors. Certain 
characteristic signs, symptoms, and related factors can be quantitated through a variety 
of methods to yield important diagnostic information. 

The tertns "level of expression of a gene in a cell" or "gene expression level" 
refer to the level of mRNA, as well as pre-mRNA nascent transcript(s), transcript 
processing intennediates, mature mRNA(s) and degradation products, encoded by the 
gene in the cell. 

The term "modulation" refers to upregulation (i.e., activation or stimulation), 
downregulation (i.e., inhibition or siippression) of a response, or the two in combination 
or apart. A "modulator" is a compound or molecule that modulates, and may be, e.g., 
an agonist, antagonist, activator, stimulator, suppressor, or inhibitor. 

The term "agonist" refers to an agent that mimics or up-regulates (e.g., 
potentiates or supplements) the bioactivity of a protein, e.g., polypeptide X. An agonist 
may be a wild-type protein or derivative thereof having at least one bioactivity of the 
wild-type protein. An agonist may also be a compound that upregulates expression of a 
gene or which increases at least one bioactivity of a protein. An agonist may also be a 
compoxmd which increases the interaction of a polypeptide with another molecule, e.g., 
a target peptide or nucleic acid. 

The term "antagonist" refers to an agent that downregulates (e.g., suppresses or 
inhibits) at least one bioactivity of a protein. An antagonist may be a compound which 
inhibits or decreases the interaction between a protein and another molecule, e.g., a 
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target peptide or enzyme substrate. An antagonist may also be a compoimd that 
downregulates expression of a gene or which reduces the amount of expressed protein 
present. 

5 * "■■ ■The'Teffn'''pfop]^lactic" or" "therapeutic" treatfarefiffefSB to adfiumslratioh to " 
the subject of one or more of the subject compositions. If it is administered prior to 
clinical manifestation of the unwanted condition (e.g., disease or other unwanted state 
of the host animal) then the treatment is prophylactic, i.e., it protects the host against 
developing the mwauted condition, whereas if administered after manifestation of the 
10 unwanted condition, the treatment is therapeutic (i.e., it is intended to diTniTiiHh^ 
ameUorate or maintain the existing unwanted condition or side effects therefrom). 

The term "therapeutic effect" refers to a local or systemic effect in animals, 
particularly m a mm als, and more particularly humans caused by a pharmacologically 

15 active substance. The term thus means any substance intended for use in the diagnosis, 
cure, mitigation, treatment or prevention of disease or in the enhancement of desirable 
physical or mental development and conditions in an animal or human. The phrase 
"therapeutically-effective amount" means that amoxmt of such a substance that 
produces some desired local or systemic effect at a reasonable benefit/risk ratio 

20 applicable to any treatment. In certain embodiments, a therapeutically-effective amount 
of a compound will depend on its therapeutic index, solubility, and the like. For 
example, certain compounds discovered by the methods of the present invention may 
be administered in a sufficient amount to produce a reasonable benefit/risk ratio 
applicable to such treatment. 

25 

A probe that is "labeled" is detectable, either directly or indirectly, by 
spectroscopic, photochemical, biochemical, immunochemical, isotopic, or chemical 
means. For example, useful labels include ^^P, ^^S, "'C, ^H, '"l, stable isotopes, 
fluorescent dyes and fluorettes (Ro2dnov and Nolan (1998) Chem. Biol 5:713-728; 
30 Molecular Probes, inc. (2003) Catalogue, Molecular Probes, Eugene Oreg.), electron- 
dense reagents, enzymes and/or substrates, e.g., as used in enzyme-linlced 
immunoassays as with those using alkaline phosphatase or horse radish peroxidase. The 
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label or detectable moiety is typically bound, either covalently, through a linker or 
chemical bound, or through ionic, van der Waals or hydrogen bonds to the molecule to 
be detected. "Radiolabeled" refers to a compound to which a radioisotope has been 
attached through covalent or non-covalent means. A "fluorophore" is a compound or 
moiety that absorbs radiant energy of one wavelength and emits radianrefiargy of a 
second, longer wavelength. 

A "labeled nucleic acid probe or oligonucleotide" is one that is bound, either 
covalently, through a linker or a chemical bond, or noncovalently, through ionic, van 
der Waals, electrostatic, or hydrogen bonds to a label such that the presence of the 
probe can be detected by detecting the presence of the label bound to the probe. The 
probes are preferably directly labeled as with isotopes, chromophores, fluorophores, 
chrbmogens, or indirectly labeled such as with biotin to which a streptavidin complex 
or avidin complex can later bind. 

A "nucleic acid probe" is a nucleic acid capable of binding to a target nucleic 
acid of complementary sequence, usually through complementary base pairing, e.g., 
through hydrogen bond formation. A probe may include natural, e.g.. A, G, C, or T, or 
modified bases, e.g., 7-deazaguanosine, inosine, etc. The bases in a probe can be joined 
by a hnkage other than a phosphodiester bond. Probes can be peptide nucleic acids in 
which the constituent bases are joined by peptide bonds rather than phosphodiester 
linkages. It will be understood by one of skill in the art that probes may bind target 
sequences lacking complete complementarity with the probe sequence depending upon 
the stringency of the hybridization conditions. 

"Small molecule" is defined as a molecule with a molecular weight that is less 
than 10 IcD, typically less than 2 kD, and preferably less than 1 KD. Small molecules 
include, but are not limited to, inorganic molecules, organic molecules, organic 
molecules containing an inorganic component, molecules comprising a radioactive 
atom, synthetic molecules, peptide mimetics; and antibody mimetics. As ather^eutic, 
a small molecule may be more permeable to cells, less susceptible to degradation, and 
less apt to eUcit an immune response than large molecules. Small molecule toxins are 
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described, see, e.g., U.S. Pat. No. 6,326,482 issued to Stewart, et al. 

A small molecule refers to a composition, which has a molecular weight of less than 

about 1000 kDa. 

in. Identification of Trahscriptibnal Teffeets and Transcriptional Netwnrlcs 

One aspect of the invention provides amettiod of determining which genes from 
a subset of genes are regulated by a transcriptional regulator in a cell, the method 
con^jrising (a) selectively isolating chromatin from a cell which expresses tbe 
transcriptional regulator to generate isolated chromatin; (b) selectively isolating 
chromatin fragments from the isolated chromatin to generate bound chromatin 
fragments, wherein the bound chromatin fragments are bound by the transcriptional 
regulator; (c) amplifying both the bound chroinatin fragments to generate amplified 
chromatin fragments and the isolated chromatin to generate amplified control 
chromatin; (d) hybridizing the amplified control chromatin and the amphfied 
chromatin fragments to a DNA microarray, wherein the DNA microarray comprises 
(1) at least 10,000 experimental spots, each experimental spot comprising an 
experimental DNA, each experimental DNA comprising a promoter region from a gene 
in the subset; and (2) at least 100 control spots, each control spot comprising a control 
DNA, each control DNA comprisiag a non-promoter region; and (e) determining and 
comparing a hybridization signal at each of the spots on the microarray between those 
generated by (1) the amphfied control chromatin; and (2) the amplified chromatin 
fragments; wherein a gene in the subset is said to be regulated by the transcriptional 
regulator in the cell if a spot comprising a promoter region of said gene displays a 
higher level of hybridization by the amplified chromatin fiiagments tixan by the 
amphfied control chromatin. 

Methods of isolating chromatin, and in particular chromatin fragments that are 
bound by a transcriptional regulator, may be carried out by any method known to one 
skilled in the art, including by cross-hnking the transcriptional regulator to chromatin, 
fragmenting the chromatin, and immunoprecipitating the transcriptional regiilators. 

In a preferred embodiment, the chromatin Augments bound by the 



-18- 



wo 2005/054461 



PCT/US2004/039805 



transcriptional regulator are isolated using chromatin immunoprecipitation (ChIP). 
Briefly, this technique involves the use of a specific antibody to rmmunoprecipitate 
chronoatin complexes comprising the corresponding antigen i.e. the transcriptional 
regulator, and examination of the nucleotide sequences present in the 
' immunoprecipitate. Lnmunoprecipitation of a particular sequence by the antibody is 
indicative of interaction of the antigen with that sequence. See, for example, O'Neill et 
al. in. Methods in Ensymology, Vol. 274, Academic Press, San Diego, 1999, pp. 189- 
197; Kuo et al. (1999) Method 19:425-433; and Ausubel et al., supra. Chapter 21. 

In one embodiment, the chromatin inununoprecipitation technique is applied as 
follows. Cells which ejqjress the transcriptional regulator of interest, such as a native 
transcriptional regulator or a recombinant transcriptional regulator, are treated with an 
agent that crosslinks the transcriptional regulator to chromatin if that transcriptional 
regulator is stably bound to it. In one embodiment of the methods described herein, the 
crosslinking is formaldehyde crosslinking (Solomon, M.J. and Varshavsky, A., Proc. 
Nati. Sci. USA 82:6470-6474; Orlando, V., TIBS, 25:99-104). UV hght may also be 
used (Pashev et al. Trends Biochem Sci. 1991;16(9):323-6; Zhang L et al. Biochem 
BiophysRes Commvn. 2004;322(3):705-ll). 

Subsequent to crosslinking, cellular nucleic acid is isolated, sheared such as by 
sonication and incubated in the presence of an antibody directed against the 
transcriptional regulator. Antibody-antigen complexes are precipitated, crosslinks are 
reversed (for example, formaldehyde-induced DNA-protein crosslinks can be reversed 
by heating) so that the sequence content of the immimoprecipitated DNA is tested for 
the presence of a specific sequence, for example, promoter regions. The antibody may 
bind directly to an epitope on the transcriptional regulator or it may bind to a tag on the 
regxilator, such as a myc tag when used with an anti-Myc antibody (Santa Cruz 
Biotechnology, sc-764). 

In yet another embodiment, a non-antibody agent with afSnity for the 
transcriptional regulator or for a tag used to it is used in place of the antibody. For 
example, if the transcriptional regulator comprises an afBnity tag, such as a six- 
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histidine tag, complexes may be isolated by affinity chromatography to nickel- 
containing sepharose. Additional variations on ChIP methods within the scope of the 
invention may be found in Kurdistani et al. Methods. 2003 31(l):90-5; O'Neill et al. 
Methods. 2003, 31(l):76-82; Spencer et al.. Methods. 2003;31(I):67-75; and Orlando 
5 et a:MefBods irr205-2r4-(r997): 

Jn an alternate embodiment of the methods described herein for identifying 
genes regulated by a transcriptional regulator, amplified chromatin fragments from a 
control immunoprecipitation reaction are used in place of the isolated chromatin as a 

10 control. For example, an antibody that does not react with the transcription factor being 
tested may be used in a chromatin IP procediire to isolate control chromatin, which can 
then be compared to the chromatin isolated using an antibody that does react Avith the 
transcriptional regulator. In preferred embodiments, the antibody that does not react 
with the transcription factor being tested also does not react with other transcriptional 

1 5 regulators or DNA binding proteins. 



In one embodiment, the amplified control chromatin and the amplified 
chromatin fragments are generated from flieir corresponding template DNA using 
ligation-mediated polymerase chain reaction (LM-PCR) (e.g., see Current Protocols in 

20 Molecular Biology, Ausubel, F. M. et al., eds. 1991, and U.S. Apphcation No. 

2003/0143599, the teachings of which are incorporated herein by reference) in then- 
entirety. In specific embodiments, LM-PCR comprises fluorescently labeling amplified 
DNA by including fluorescently-tagged nucleotides in the LM-PCR reaction. 
Additional variations for manipulating and examining chromatin using microarrays 

25 have described in U.S. Patent Nos. 6,410,243, the teachings of which are incorporated 
herein by reference. 

In one embodiment, the labelled or unlabeled probes are hybridized to DNA 
microarray, such as is described in U.S. Patent No. 6,410,243 . Microarrays, also called 
30 "biochips" or "arrays" are miniaturized devices typically with dimensions in the 

micrometer to millimeter range for performing chemical and biochemical reactions and 
are particularly suited for embodiments of the invention. Arrays may be constructed via 
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microelectronic and/or microfabrication using essentially any and all techniques known 
and available ia the semiconductor industry and/or in the biochemistry industry, 
provided only that such techniques are amenable to and compatible with the deposition 
and screening of polynucleotide sequences. Microarrays are particularly desirable for 
"5^ their, virtues of high sample-throughput and low cost for generating profiles- and other 
data. Additional variations for manipulating and examining chromatin using 
microarrays have described in U.S. Patent Nos. 6,410,243, the teachings of which are 
incorporated herein by reference. 

10 In one embodiment of the methods described, amplified control chromatin and 

the amplified chromatin firagments are hybridized to a DNA microarray that includes 
experimental spots that represent aU or a subset (e.g., a chromosome or chromosomes) 
of the genome. The fluorescent intensity of each experimental spot on the microarray 
&om the amplified chromatin firagments relative to the amplified control chromatin 

1 5 indicates wheflier the protein of interest is bound to the DNA region located at that 
particular spot. Hence, the me'thods described herein allow the detection of protein- 
DNA interactions across an entire genome. 

In some embodiments of the methods described herein, the promoter region of a 
20 gene comprises firom at least 700bp upstream to at least 200 bp downstream of the 
transcriptional start site of the gene. In some embodiments, the promoter region 
comprises at least about 30, 40, 50, or 60 nucleotides in length. In specific 
embodiments, the promoter region of a gene as foimd on the spots of the microarray 
comprises a sequence of at least 30 nucleotides whose sequence is identical to a region 
25 stretching Jfrom 3 kb upstream to 1 kb downstream of the transcriptional start site of 
said gene. In some embodiments, the DNA microarray includes control spots of non- 
promoter DNA. In specific embodiment, the non-promoter region comprises an open 
reading fi-ame. In preferred embodiments, the non-promoter regions comprise genomic 
regions which are not bound by transcriptional regulators, and preferably which are not 
30 boimd by the transcriptional regulator being tested, in some embodiments, not all the 
experimental spots or the control spots comprise e3q)erimental DNA or control DNA 
respectively. Furthermore, in some specific embodiments some spots comprise control 
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DNA which comprises promoter DNA. One skilled in the art may determine the 
number of experimental or control spots for a given apphcation. 

In some embodiments of the methods described herein, the level of 
"'5 "l^iBridizMon of the~a^li£fed chr^ 

normalized by the level of hybridization of the amplified chromatin fragments to the 
control spots. In specific embodiments, the normalization is performed by subtracting 
the mean level of hybridization of the amplified chromatin fragments to the control 
spots from the level of hybridization of the amplified chromatin fragments at each 
10 experimental spot. 

Methods of analyzing data from microarrays are well-described in the art, 
including in DNA Microarrays: A Molecular Cloning Manual, Ed by Bowtel and 
Sambrook (Cold Spring Harbor Laboratory Press, 2002); Microarrays for an hitegrative 
.15 Genomics by Kohana (MTT Press, 2002); A Biologist's Guide to Analysis of DNA 
Microairay Data, by Knudsen (Wiley, John & Sons, Incorporated, 2002); and DNA 
Microarrays: A Practical Approach, Vol. 205 by Schema (Oxford University Press, 
1999); and Methods of Microarray Data Analysis U, ed by Lin et al. (Kluwer Academic 
Publishers, 2002), hereby incorporated by reference in their entirety. 

20 

In some embodiments of any of the methods described herein, the 
transcriptional regulator is native to the cell. By native it is meant that the 
transcriptional regulator naturally occurs in the cell. In other embodiments, the 
transcriptional regulator is a recombinant transcriptional regulator. In some 

25 embodiments, the transcriptional regulator origmates from a species which is different 
from tiiat of the cell. In some embodunents, the transcriptional regulator is a viral 
transcriptional regulator. In such embodunents, a cell may be contacted with a virus 
and chromatin extracted fix)m the infected cell after allowing sufficient time for the 
viral proteiiis to be expressed. In some embodiments, recombinant transcriptional 

30 regulators have missense mutations, truncations, or mserted sequences or entfre 

domains from other naturally occurring proteins. A tagged recombmant transcriptional 
regulator maybe used m some embodiments the methods of the present inveiition as 
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the tag may facilitate the immunoprecipitation of the regulator. 

In certain embodiments of the invention, transcriptional regulators comprise 
specific transcription factors, coactivators, corepressors or complexes thereof. 
- - -Transcription factors bind to specific-cognate-DNA-elements-sueh-as-promoters, 
enhancers and silencer elements, and are responsible for regulating gene expression. 
Transcription factors may be activators of transcription, repressors of transcription or 
both, depending on the cellular context. Transcription factors may belong to any class 
or type of known or identified transcription factor. Examples of known families or 
structurally-related transcription factors include heUx-loop-helix, leucine zipper, zinc 
finger, ring finger, and hormone receptors. Transcriptio^ factors may also be selected 
based upon their known association with a disease or the regulation of one or more 
genes. For example, transcription factors such as c-myc, Rel/Nf-kB, neuroD, c-fos, c- 
jun, and E2F may be targeted. Antibodies directed to any transcriptional coactivator or 
corepressor may also be used according to the invention. Examples of specific 
coactivators include CBP, CTDA, and SRA, while specific examples of corepressors 
include the mSin3 proteins, MTTR, and LEUNIG. Furthemiore, the genes regulated by 
proteins associated with transcriptional complexes, such as the histdne acetylases 
(HATs) and histone deacetylases (FED AGs), may also de determined using the methods 
described herein. 

In one embodiment of the methods described herein, the cell is a primary cell. 
Primary cells are directly isolated firom an organism and have imdergone minimum 
passaging in vitro, and thus maintain most of the phenotypic characteristics of cells in 
the organism. In a specific embodiment, the primary cells axe primary cells that have 
doubled less than 10 times ex vivo. In some embodiments, the cell is derived from 
transplant grade tissue or fi-eshly isolated tissue. The cell type used in the assays 
described herein may be any cell type. The cell may be eukaryotic or prokaryotic, fi-om 
a metazoan or firom a single-celled organism such as yeast. In some preferred 
embodiments the cell is a mammalian cell, such as a cell firom a rodent, a primate or a 
human. The cell may be a wild-type cell or a cell that has been genetically modified by 
recombinant means or by exposure to mutagens. The ceD may be a transformed cell or 
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an immortalized cell. In some embodiments, the cell is from an organism afflicted by a 
disease. In some embodiments, the cell comprises a genetic mutation that results in 
disease, such as in a hyperplastic condition. 

Insbine embodiments; the celllis dCTived"from transplmt-grade^tissue or freshly 
isolated tissue. In some embodiments, the cell is derived from a tissue biopsy, such as 
from a subject afflicted with, or suspected of being afOicted with, a disorder. In 
another embodiment, the cell is isolated from, a bodily fluid or bodily secretion, 
including serum, plasma, saliva, tears, sweat, semen, amniotic fluid, vaginal secretions, 
nasal secretions, synovial fluid, spinal fluid, phlegm, bronchoalveolar lavage fluid, 
blister fluid, pus, stool and intracranial fluid. The cell may be a live cell or a cell that 
has been preserved, such as by treatment with formalin, B5, Zenker's fixatives, Lugol's 
solution, Camoy's Fixative, FX 3 fixative, or other preservatives, or a cell that has been 
preserved by freezing. 

In some embodioients of the methods described herein, the cell has been treated 
with an agent, such as compound or a drug, prior to isolation of chromatin. Some 
preferred agents include those which bind to or regulate the expression of 
transcriptional regulators. In some embodiments, the genes that are regulated by a 
given transcriptional regulator are determiaed both in a ceU that is contacted with an 
agent and in a cell that is not contacted with the agent, or that is contacted with a 
different amount of the agent. Such methods may be used to identify compounds that 
alter the types of genes and/or the extent to which a transcriptional regulators controls 
transcription of those genes. Furthermore, such approaches may be used to screen for 
agents which alter the activity, specificity or expression of a transcriptional regiilator. 

In some embodiment of the methods described herein for identifying genes 
regulated by a transcriptional regulator, a higher level of hybridization by the amplified 
chromatin fragments than by the amplified control chromatin comprises at least a two- 
fold higher level of hybridization. The threshold for what constitutes a higher level of 
hybridization, may be adjusted by one skilled in the art for the particular application. 
Higher levels of hybridization are expected to yield a smaller target size but with higher 
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certainty that a given gene above that threshold is regulated by the transcriptional 
regulator in that cell in vivo. 

In other embodiments of the methods described herein for identifying genes 
5 regulated by a transcriptional' regulator; the traiiscnptibliarfegulatOT is a basal 

transcription factor or a component of the basal transcription machinery. In specific 
embodiments, components of the basal transcription machinery comprise KNA 
polymerases, including poll, poDI and polffi, TBP, NTF-1 and Spl and any other 
component of TFIDD, including, for example, the TAFs (e.g. TAF250, TAF150, 
10 TAF135, TAF95, TAF80, TAF55, TAF31, TAF28, and TAF20), or any other 
component of a polymerase holoenzyme. 

Another aspect of the invention provides a method of identifying 
transcriptionally active genes that are regulated by a transcriptional regulator in a cell. 

1 5 The method comprises determining what genes are regulated by the transcriptional 
regulator and determining which ones are transcriptionally active in the cell. In one 
embodiment, a set of genes which are transcriptionally active is the set of genes whose 
promoters are bound by an RNA polymerase, such as RNA polymerase H, or by a 
member of the basal transcription machinery. Altanatively, genes which are 

20 transcriptionally active may be identified using other techniques know in the art. For 
example, mRNA from a cell which expresses the transcriptional regulator can be 
collected and examined on a DNA micro array which comprises coding sequences in 
order to determine which genes are being transcribed. 

25 In one embodiment, the invention provides a method of identifying 

transcriptionally active genes that are regulated by a transcriptional regulator in a cell, 
the method comprising (a) selectively isolating chromatin from a tissue; (b) identifying 
promoter regions from the chromatin that are bound by the transcriptional regulator; (c) 
identifying promoter regions from the chromatin that are bound by a member of the 

30 basal transcriptional machinay, and (d) comparing the promoter regions identified in 
steps (b) and (c) to determine overlapping genes, wherein the overlapping genes are 
transcriptionally active genes regulated by the transcriptional regulator. 
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In a related aspect, the invention provides methods to determine if a 
transcriptional regulator is a global transcription regulator. One method comprises 
estimating if a transcriptional regulator is a global transcriptional regulator, the method 
"comprisiiiglarsWctively isolating^ idantifying promoter 

regions from the chromatin which are bound by a candidate global transcriptional 
regulator; (c) identifying promoter regions from the chromatin which are bound by a 
member of the basal transcriptional machinery; and (d) comparing the promoter regions 
identified in steps (b) and (c) to determine the ratio between (i) the number of promoter 
regions bound by both the candidate global transcriptional regulator and the member of 
the basal transcriptional machinery; and (ii) the number of promoter regions bound by 
the member of the basal transcriptional machinery wherein a transcriptional regulator is 
a global transcriptional regulator when the ratio is greater than 0.2. 

In a preferred embodiment of the methods described above, steps (b) and (c) are 
performed using a DNA microarray. In a specific embodiment, the DNA microairay 
comprises (i) at least 10,000 experimental spots, each experimental spot comprising an 
experimental DNA, each experimental DNA comprising a promoter region from a 
human gene in the subset; and (ii) at least 100 control spots, each control spot 
comprising a control DNA, each control DNA comprising a non-promoter region. Any 
type of microarray or array may be used. 

In one embodiment of the methods described above, the member of the 
transcriptional machinery is an RNA polymerase, such as RNA polymerase n, a 
TATA-binding protein, or any otha- component of TFEO, including, for example, the 
TAFs (e.g. TAF250. TAF150. TAF135, TAF95, TAF80, TAF55, TAF31, TAF28, and 
TAF20). 

Another aspect of the mvention provides methods of identifying regulatory 
networks, or pathways, in a cell. The methods provided by the invention allow the 
identification of fee regulatory motife, such as those shown in Figure 2B. A regulatory 
pathway can include, for example, a pathway that controls a ceUular function under a 
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specific condition. A regulatory pathway controls a cellular function by, for example, 
altering the activity of a system component or the activity of a biochemical, gene 
expression or other type of pathway. Alterations in activity include, for example, 
inducing a change in the expression, activity, or physical interactions of a pathway 

5 -compdnent under a specific condition- Specific examples of regulatory pathways 
include a pathway that activates a cellular function in response to an environmental 
stimulus of a biochemical system, such as the inhibition of oeU differentiation in 
response to the presence of a cell growth signal and the activation of galactose import 
and catalysis in response to the presence of galactose and the absence of repressing 

10 sugars. The term "component" when used in reference to a network or pathway is 
intended to mean a molecular constituent of the biochemical system, network or 
pathway, such as, for example, a polypeptide, nucleic acid, other macromolecule or 
other biological molecule. 

15 In one aspect, the invention provides a method of identifying a transcriptional 

regulatory network in a ceU, the method comprising determining if a transcriptional 
regulator regulates additional transcriptional regulators in the cell, such as by using any 
of the methods described herein, wherein a transcriptional regulatory network is 
identified if at least one additional transcriptional regulator is regulated by the 

20 transcriptional regulator.: 

Another aspect of the invention provides a method of identifying a 
transcriptional regulatory network in a cell, the method comprising determining if a 
transcriptional regulator regulates (i) its own promoter; or (ii) a promoter from a 

25 plurality of transcriptional regulators; such as by using any of the methods described 
herein, wherein the experimental DNA comprises (a) a promoter from the 
transcriptional regulator; and (b) promoters fi-om the pluraUty of transcriptional 
regulators; wherein a transcriptional regulatory network is identified if the 
transcriptional regulator regulates itself or if it regulates at least one of the pluraUty of 

30 transcriptional regulators. 

Yet another aspect of the invention provides a method of identifying 
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transcriptional regulatory networks in a cell, the method comprising (a) dete rminin g, 
by repeating one of the methods described herein for each of a plurality of 
transcriptional regulators, the genes in a subset which are regulated by each of the 
plurality of transcriptional regulators, wherein the experimental DNA comprises 
5- -promoter regions for each-ofthe-plurality of transcriptional-regulators; (b) detennining 
if any one of the plurality of transcriptional regulators are regulated by at least one of 
the plurality of transcriptional regulators; wherein a transcriptional regulatory network 
is identified if any one of the plurality of transcriptional regulators is regulated by at 
least one of the plurality of transcriptional regulators. 



10 



15 



25 



Specific embodiments of the methods for identifying regulatory networks 
described herein further comprise determining if any of the genes regulated by one of 
the plurality of transcriptional regulators- is also a target of any of the other 
transcriptional regulators 



The invention further provides algorithms for the identification of regulatory 
motife, which may be used in conjuction with any of the methods provided herein, such 
as the methods for identifying the genes regulated by a transcriptional regulator. In a 
specific embodiment, two data matrices are created. The overall matrix D consists of 
20 binary entries Dij, where a 1 indicates binding of regulator j to intergenic region i, a 0 
indicates no binding event. The regulator matrix R is a subset of D, containing only the 
rows corresponding to the intergenic region assigned to each regulator, in the same 
order as the columns of regulators. The analyses may be performed using Matlab® 
software. The algorithms to find each motif are described as follows: 



Autoregulatory motif: Find each non-zero entry on the diagonal of R. 



Feedforward loop: For each master regulator (column of R), find non-zero 
entries, which correspond to regulators bound. For each master regulator / secondary 
30 regulator pair, find all rows in D bound by both regulators. 

Multi-component loop: For each regulator (column of R), find the regulators to 
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which it binds. For each of these, find the regulators it binds. If any of these are the 
original regulator, you have a multi-component loop of two. For all others, find 
regulators to which they bind. If any of these are the original, you have a 
multicomponent loop of three. Repeat to find larger loops. 

Single input module: Find the intergenic regions bound by only one regulator. 
That is, take the subset of rows of D such that the sum of each row is 1. Then for each 
regulator (column), find non-zero eutries. Each set (greater than three intergenic 
regions) is a SIM. 

10 

Multi-input module: Find the intergenic regions bound by more than one 
regulator. That is, take the subset of rows of D such that the sum of each row is greater 
than 1. Then, for each row, find any other row bound by the same regulators. The 
collection of rows bound by the same regulators correspond to a MIM. Once a row is 
1 5 assigned to a MIM, remove it fix>m further analysis. 

Regulator chain: For each regulator (column of R), use a recursive algorithm to 
find chains of all lengths. That is, for each regulator whose promoter is bound by the 
regulator before it in the chain, find the regulator promoters to which it binds. Repeat 
20 until the chain ends. There are three possible ways to end a chain: a regulator that does 
not bind to the promoter of any other regulator, a regulator that binds to its own 
promoter, or one that binds to the promoter of another regulator earher in the chain. 

In one preferred embodiment of any of the methods described herein such as the 
25 methods for identifying regulatory networks, the ej^erimental DNA in the microarray 
comprises promoter regions firom additional transcriptional regulators or from genes 
suspected to encode transcriptional regulators. Such microarray enables one skilled in 
the art to identify the components of a regulatory pathway. For exsaxsple, starting with 
one transariptional regulator, a subset of the genes it regulates are identified using any 
30 method, such as those described herein. If one identified gene is itself a second 

transcriptional regulator or is suspected to encode a transcriptional regulator, then ttie 
subset of genes the second transcriptional regulator regulates is identified, and so on. 
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Furthermore, the subset of genes that the £u:st and second transcriptional regulators 
regulate can be compared to deteiznine of any genes are found in both subsets. If so, 
then a feed-forward motif, a unit of a regulatory network, has been identified. 
Likewise, if the second transcriptional regulator is found to regulate the jBrst one, then a 
5 feedback loop lias been idmtifiedT" ' — - ■— - - 

4. Development of a Therapeutic to Treat or Prevent Disorders 

One aspect of the invention provides methods of identifying targets for the 
development of therapeutics. One aspect of the invention provides a method of 

10 identifying at least one target gene for the development of a therapeutic to treat or 

prevent a disorder in a subject, wherein at least one form of the disorder is caused by an 
altered activity in a transcriptional regulator or in a suspected transcriptional regulator, 
the method comprising (a) identifyrag the genes regulated by the transcriptional 
regulator in a cell; (b) determining if the transcriptional regulator is a broad-acting 

1 5 transcriptional regulator or a narrow-acting transcriptional regulator, wherein if the 
transcriptional regulator is a broad acting transcriptional regulator then the 
transcriptional regulator is a target gene for the development of a therapeutic, and 
wherein if the transcriptional regulator is a narrow acting transcriptional regulator then 
(i) determining if at least one gene regulated by the transcriptional regulator is likely 

20 causative in the disorder, wherein a gene that is likely causative in the disorder is a 

target gene for the development of a therapeutic; and (it) reiterating steps (a) and (b) for 
at least one gene that is regulated by the transcriptional regulator in the cell and that 
either (1) encodes a transcriptional regulator or (2) is suspected to encode a 
transcriptional regulator, with the modification that the transcriptional regulator of steps 

25 (a) and (b) is said gene, thereby identifying at least one target gene for the development 
of a therapeutic to treat or prevent a disorder in the subject. 

In some embodiments of the methods for identifying a target gene for the 
development of a therapeutic, the genes regulated by the transcriptional regulator in the 
30 cell are identified using chromosome-wide location analysis, analysis of mRNA 

transcripts in a cell that eTqsresses the transcriptional regulator, or by using any of the 
methods provided herein for the identification of the genes that are regulated by a 
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transcriptional regulator. Some methods may comprise the use of DNA microarray or 
DNA arrays, such as those described in Gabrielson et al.. Obesity Research, 8(5), 374- 
384 (2000). 

5 In sonie embodiments" of the methods described herein for identifying a target 

gene for the development of a ther^eutic, the transcriptional regulator is a master 
regulatory gene. In specific embodiments, the master regulatory gene is 
SOXl-18, OCT6, PAX3, Myocardin, GATAl-6, TCFI/HNFIA, HNF4A, HNF6, 
NGN3, C/EBP, FOXAl-3, IPFl, GATA, HNF3, NKX2.1, GDX, FTF/NR5A2, 

10 C/EBPbeta, SCLl, SKINl, or a member of the neurogenin, LK, LMO, SOX, OCT, 
PAX, GATA or MyoD family of transcription factors. 

In some embodiments of the methods described herein, the transcriptional 

regulator is PAX3, EGR-1, EGR-2, OCT6, a SOX family member, a GATA family 
1 5 member, a PAX family member, an OCT family member, KFX5 , WHN, GAT Al , 
VDR, CRX, CBP, MeCP2, AMLl, p53, PLZF, PML, Rb, WTl, NR3C2, GCCR, 
PPARgamma, SIMl, HNFl alpha, HNFlbeta, HNF4alpha, PDXl, MAP A, FOXA2, or 
NEURODl. 

20 A transcriptional regulator whose altered activity can lead to disease might be 

e3q)ressed in multiple, or all tissues of an organism, such that any of multiple cell types 
may be used in identifying a therapeutic. In some embodiments of the methods 
described herein for identifying a target gene for the development of a therapeutic, the 
cell is derived fix>m a tissue whose function is impaired in the disorder. For example, a 

25 pancreatic cell may be used for diabetes, a cardiac muscle cells for myocardial 
infarction, or neurons for Alzheimer's disease. 

Tn specific embodiments of the methods described herein for identifying a target 
gene for the development of a therapeutic, the broad acting gene regulates at least about 
30 1%, 2% or more preferably at least about 2.5% of the genes in the cell, and the narrow 
acting gene regulates less than about 1%, 2% or 2.5% of the genes in the cell. 
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In specific embodiments of the methods described herein, a gene is suspected to 
encode a transcriptional regulator if it shares at least about 30%, 40% or 50% amino 
acid sequence identity within at least the DNA binding domain of a transcriptional 
regulator. DNA binding domains and methods of performing nucleic acids and 
polypeptide sequence alignments are well-known in the art. Optimal alignment of 
sequences for comparison maybe conducted by the local homology algoritimi of Smith 
and Waterman, Adv. Appl. Math. 2: 482 (1981); by the homology alignment algorithm 
of Needleman and Wunsch, J. Mol Biol. 48: 443 (1970); by the search for similarity 
method of Pearson and Lipman, Proc. Natl Acad. Sci. 8: 2444 (1988); by computerized 
implementations of these algorithms, including, but not limited to: CLUSTAL in the 
PC/Gene program by hitelligenetics. Mountain View, Calif., GAP, BESTFIT, BLAST, 
FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics 
Computer Group (GCG), 7 Science Dr., Madison, Wis., USA; the CLUSTAL program 
is well described by Higgins and Sharp, Gene, 13,: 237-244, 1988; Higgins and Sharp, 
CABIOS :11-13, 1989; CorpeA^ dt el.. Nucleic Acids Research, 16:881-90,1988; Huang, 
et al.. Computer Applications in the Biosciences 8:1-7,1992; and Pearson, et al., 
Methods in Molecular Biology 24:7-331,1994. 

In some specific embodiments of the methods described herein for identifying a 
target gene for the development of a therapeutic, the gene regulated by the 
transcriptional regulator is said to be likely causative of the disorder if a mutation in 
said gene results in at least one phenotype or symptom associated with the disorder. In 
another specific embodiment, the gene regulated by the transcriptional regulator is said 
to be likely causative of the disorder when the gene encodes an enzyme or signaling 
molecule which functions in a pathway that is impaired in the disorder. For example, if 
the disease is type n diabetes, a disorder characterized by hyperglycemia, then a gene 
regulated by the transcriptional regulator which encodes a sugar transporter, an 
enzyme involved in catalyzing a step of glycolysis or gluconeogenesis, or a gene which 
regulates insulin production, secretion or signaling is said to be likely causative or the 
disorder. In another specific embodiment, the gene regulated by the transcriptional 
regulator is said to be likely causative of the disorder if a mutant allele of the gene is 
genetically linked to a "susceptibility locus" for at least one form of the disease. A 
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"susceptibility locus" for a particular disease is a sequence or gene locus implicated in 
the initiation or progression of the disease. The susceptibiUty locus can be, for example, 
a gene or a microsatellite repeat, as identified by a microsatellite marker, or can be 
identified by a defined single nucleotide polymorphism. Generally, susceptibility genes 
5 impUcated M specific diseasemd-'f&er tod e& be fdimd^in scientific publications, but 
may also be determined experimentally. 

In some embodiments of the methods described herein for identifying a target 
gene for the development of a therapeutic, the altered activity in the transcriptional 

10 regulator comprises at least one of the following: (a) an alteration in the binding 

affmityofthe transcriptional regulator to DNA; (b) an alteration in the abiUty of the 
transcriptional regulator to bind to ENA polymerase, to an RNA polymerase 
holoenzyme, or to a second transcriptional regulator; (c) an alteration in the binding 
affinity of the transcriptional regulator to a hgand; (d) an alteration in expression level 

15 or expression pattern of tiie transcriptional regulator; or (e) an alteration in an ability of 
the transcriptional regulator to form homomultimers or heteromultimers. 

In some embodiments of the methods described herein, the cell comprises a 
mutant form of the transcriptional regulator. A preferred mutant form of the 

20 transcriptional regulator is one that causes the disease to which the therapeutic is 
sought. Such embodiments are particularly preferred when a mutant transcriptional 
regulator which causes at least one form of the disease has an altered target specificity 
and thus the genes it regulates, or the extent to which it regulates their transcription, is 
altered when compared to the non-mutant form of the transcriptional regulator. Such 

25 embodiments may allow the identification of therapeutic targets which might not have 
been identified if a wild-type form of the transcriptional regulator had been used. 
Mutations in tiie DNA binding domain, for example, may alter the target specificity of 
a transcriptional regulator by altering its aflSnity for various DNA binding sequences. 



30 



It is well-known to one skilled in tiie art that mutations in a transcriptional 
regulator may result in a hypomorphic, hypermorphic or neomoiphic phenotype. 
Mutations may generally reduce the activity of a transcriptional regulator, may 
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generally increase it activity, or may confer novel properties, such as altering the range 
of targets or turning an activator into a repressor or vice versa. In any methods 
described herein, and in particular those for identifying the ther^eutics, a cell 
expressing a transcriptional regulator having any of these changes in activity may be 
"3 usedr " " ' ^' _ . 

The methods described herein may be applied to any disorder for which a 
transcriptional regulator has been implicated. Exan^les of diseases and transcriptional 
regulators which cause them may be foimd in the scientific and medical literature by 

10 one skilled in the art, including in Medical Genetics, L.V. Jorde et al., Elsevier Science 
2003, and Principles of Internal Medicine, ISth edition, ed by Braunwald et al., 
McGraw-Hill, 2001; American Medical Association Complete Medical Encyclopedia. 
(Random House, Incorporated, 2003); and The Mosby Medical Encyclopedia, ed by 
Glanze (Plume, 1991). In some embodiments, the disorder is characterized by 

15 impaired function of at least one of the following: brain, spinal cord, heart, arteries, 
esophagus, stomach, small intestine, large intestine, liver, pancreas, lungs, kidney, 
urinary tract, ovaries, breasts, uterus, testis, penis, colon, prostate, bone, muscle, 
cartilage, thyroid gland, adrenal gland, pituitary, bone marrow, blood, thymus, spleen, 
lymph nodes, skin, eye, ear, nose, teeth or tongue. 

20 

In some embodiments of the methods described herein for identifying a target 
gene for the development of a therapeutic, the subject is a mammal. In preferred 
embodiments, the subject is a human. In some embodiments of the methods described 
herein for identifying a target gene for the development of a therapeutic, the therapeutic 
25 comprises a smaU molecule drug, an antisense nucleic acid, an antibody, a peptide, a 
ligand, a fatty acid, a hormone or a metabolite. 

Antisense nucleic acids acting by KNfAi include oligonucleotides which 
specifically hybridize (e.g., bind) under cellular conditions with a gene sequence, such 
30 as at the cellular mENA and/or genomic DNA level, so as to inhibit expression of that 
gene, e.g., by inhibiting transcription and/or translation. The binding may be by 
conventional base pair complementarily, or, for example, in the case of binding to DNA 
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duplexes, through specific interactions in the major groove of the double heHx. 
Preferred antisense nucleic acid comprise siKNA, shKNAs, or any other form of double 
stranded RNA molecule. Antisense nucleic acids may be chemically modified, such as 
to increase their in vivo stability. 

5- ■ " " " " : 

RNAi is a process of sequence-specific post-transcriptional gene repression 
which can occur in eukaryotic cells. In general, this process involves degradation of an 
mRNA of a particular sequence induced by double-stranded RNA (dsRNA) fbaX is 
homologous to that sequence. For example, the expression of a long dsRNA 

10 corresponding to the sequence of a particular single-stranded mRNA (ss mRNA) will 
labilize that message, thereby "interfering" with expression of the corresponding gene. 
Accordingly, any selected gene may be repressed by introducing a dsRNA which 
corresponds to all or a substantial part of the mRNA for that gene. It appears that when 
a long dsRNA is expressed, it is initially processed by a ribonuclease in into shorter 

1 5 dsRNA oligonucleotides of in some instances as few as 21 to 22 base pairs in length. 
Furthermore, RNAi may be effected by introduction or expression of relatively short 
homologous dsRNAs. dsRNAs shorter than about 30 bases pairs are prefenred to effect 
gene repression by RNAi (see Hunter et al. (1975) J Biol Chem 250: 409-17; Manche et 
al. (1992) Mol Cell Biol 12: 5239-48; Mmks et al. (1979) J Biol Chem 254: 10180-3; 

20 and Elbashir et al. (2001) Nature 41 1 : 494-8). 

Antibodies include whole antibodies, e.g., of any isotype (IgG, IgA, IgM, IgE, 
etc.), and includes fragmaQts thereof which are also specifically reactive with a 
vertebrate, e.g., mammalian, protein. Antibodies may be firagmented using conventional 

25 techniques and the fragments screened for utiUty in the same manner as described 
above for whole antibodies. Thus, the term includes segments of proteolytically- 
cleaved or recombinantly-prepared portions of an antibody molecule that are capable of 
selectively reacting with a certain protein. Non-limiting examples of such proteolj^c 
and/or recombinant fragments include Fab, F(ab 02, Fab ', Fv, and single chain 

30 antibodies (scFv) containing a V[L] and/or V[H] domain joined by a peptide linker. 
The scFv's maybe covalently or non-covalently linked to form antibodies having two 
or more binding sites. The subject invention includes polyclonal, monoclonal. 



-35- 



wo 2005/054461 



PCT/US2004/039805 



humanized, or other purified preparations of antibodies and recombinant antibodies. 

P^tidomimetic include compoimds containing peptide-like structural elements 
that is capable of mimicking the biological action (s) of a natural parent polypeptide. 

5- • _ 

Hormone include any one of a number of biochemical substances that are 
produced by a certain cell or tissue and that cause a specific biological change or 
activity to occur in another cell or tissue located elsewhere in the body. 

10 Metabohtes includes any substance produced by metabolism or by a metabolic 

process. "MetaboHsm", as used herein, refers to the various chemical reactions involved 
in the transformation of molecules or chemical compounds occurring in tissue and the 
cells therein. 

. 1 5 Ligands include any substance which binds to a receptor protem.. A ligand of a 

transcriptional regulator protein is a substance which binds to the regulator protein, 
such as estrogen binding to a nuclear hormone receptor. In a preferred embodiment, 
ligand binding of to a transcriptional regulator occurs with high aflSnity, The term 
ligand refers to substances including, but not limited to, a natural Ugand, whether 

20 isolated and/or purified, synthetic, and/or recombinant, a homolog of a natural ligand 
(e.g., fi-om another mammal). The term ligand encompasses substances which are 
inhibitors or promoters of receptor activity, as well as substances which selectively 
bind receptors, but lack inhibitor or promoter activity. 

25 Some aspects ofthe invention relate to the diagnosis of disease states. A 

"transcriptional fingerprint", or listing ofthe genes, and optionally to what extent, that 
are regulated by given a transcriptional regulator can be generated firom healthy 
individuals and from those afflicted with a disorder. . Comparison ofthe fingerprints 
between the two groups may define genes which are specific to one ofthe two groups, 

30 and thus serve as diagnostic for the risk that a patient is at risk, or is afflicted, with the 
disorder. In one embodiment, the transcriptional fingerprint of HNF4a is used to 
diagnose type n diabetes. A biopsy of a subject's liver or pancreas may provide tiie 
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cells for siich. analysis. 

In specific embodiments, the transcriptional fingerprint disease diagnosis 
analysis is applied to transcriptional regulators which are causative in a particular 
5 disease to diagnose the disease. This approach may be coupled to allelic genotyping of 
the transcriptional regulator gene in the subject. For example, genotyping of a subject's 
HNF4a may uncover a novel allele. By using "transcriptional fingerprint" of HNF4a in 
tissue &om that patient, one skilled in the art may determine what effect that mutation 
has in HNF4a activity and thus diagnose type n diabetes. 

10 

5. Methods of Preventing/Treating Disease through Regulatioii of HNFs 

Some aspects of the invention provide methods of treating or preventing disease 
by regulating transcriptional regulator activity, particularly that of the HKF family 
member. The invention provides a method of treating or preventing type U diabetes in 
15 a subject, comprising administering to the subject a therapeutically effective amount of . 
an agent that increases the global transcriptional activity of HNF4alpha. U.S. Patent 
No. 5,849,485 describes methods and assays for the isolation of modulators of HNF-4a 
activity, hereby incorporated by reference. 

20 The invention also provides a method of treating or preventing a disorder 

associated with low transcriptional activity of HNF4alpha in a subject, comprising 
administering to the subject a therapeutically effective amount of an agent that 
increases the global transcriptional activity of HNF4alpha. In a related aspect, the 
invention provides a method of treating or preventing a disorder associated with high 

25 transcriptional activity of HNF4alpha in a subject, comprising administering to the 
subject a ther2q)eutically effective amount of an agent that decreases the global 
transcriptional activity of HNF4alpha. 

Yet another related aspect of the invention provides a method of increasing the 
30 global transcriptional activity in a liver or a pancreatic cell comprising contacting the 
cell with an agent which increases the global transcriptional activity of HNF4alpha. 
Similarly, the invention provides a method of decreasing the global transcriptional 
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activity in a liver or a pancreatic cell comprising contacting the cell with an agent 
which decreases the global transcriptional activity of HNF4alpha. 

Applicants have identified genes tiiat are transcriptionally regulated by HNF-la, 
5 HNF4a and BDN[F6 in hepatocytes and pancreatic cells. Accordingly, the invention 

provides methods of regulating the expression level of any of these genes in a cell or in 
a subject by contacting the cell or administering to the subject and agent which 
modulates the expression level or transcriptional regulatory activity of HNF 
transcription factors. 

10 

The invention provides a method of regulating the expression level of any one 
of tiie genes in Figure 13 in a hepatoc3rte, the method comprising contacting the cell 
with an ageat which regulates the transcriptional activity of HNFlalpha. Similarly, the 
invention also provides a method of regulating the expression level of any one of the 
1 5 genes in Figure 14 in a pancreatic cell, the method comprising contacting the cell with 
an agent which regulates the transcriptional activity of HNFlalpha. 

The invention also provides a method of regulating the expression level of any 
one of the genes in Figure 16 in a hepatocyte, the method comprising contacting the 

20 cell with an agent which regulates the transcriptional activity of HNF6. Similarly, the 
invention provides a method of regulating the expression level of any one of the genes 
in Figure 17 in a pancreatic cell, the method comprisiag contacting the cell with an 
agent which regulates the transcriptional activity of HNF6. 

The invention additionally provides a method of regiilating the expression level 

25 of any one of the genes in Figure 18 in a hepatocyte, the method comprising contacting 
the ceU v^dth an agent which regulates the transcriptional activity of HQSnF4alpha. 
Similarly, the invention provides a method of regulating the expression level of any one 
of the genes in Figure 19 in a pancreatic cell, the method comprising contacting the cell 
with an agent which regulates the transcriptional activity of HNF4alpha. 

30 

Agents which modulate the transcriptional activity of HNF-4a, or any other 
HNF family member, may be identified by screening compounds for their ability to ■ 
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increase the expression level, the DNA binding activity or the transcriptional promoting 
activity of HNFAa . One assay format which can be used employs two genetic 
constructs. One is typically a plasmid that continuously expresses the transcriptional 
regulator of interest whea transfected into an appropriate cell line. CV-1 cells are most 
5 oflea used. The second is "a plasmid wMcfrteJcpicsses Efreport^,' e.g., lucifraSse under 
control of the transraiptional regulator. For example, if a compound which acts as a 
ligand for HNF-4 is to be evaluated, one of the plasmids would be a construct that 
results in expression of the HNF-4 receptor in an appropriate cell line, e.g., the CV-1 
cells. The second would possess a promoter linked to the luciferase gene in which an 

10 HNF-4 response element is inserted. If the compoimd to be tested is an agonist for the 
HNF-4 receptor,, the ligand will complex with the receptor and the resulting complex 
binds the response element and initiates transcription of the luciferase gene. In time the 
cells are 13'sed and a substrate for luciferase added. The resulting chemiluminescence is 
measured.photometrically. Dose response curves are obtained and can be compared to 

IS the activity of known ligands. Other reporters than luciferase can be used including 
CAT and other enzymes. 

Viral constructs can be used to introduce the gene for the receptor and the 
reporter. An usual viral vector is an adenovirus. For further details concemiQg this 
20 preferred assay, see U.S. Pat. No. 4,981,784 issued Jan. 1, 1991 hereby incorporated by 
reference, and Evans et al., WO88/03168 published on 5 May 1988, also incorporated 
by reference. 

HNF-4a antagonists can be identified using this same basic "agonist" assay. A 
25' fixed amoimt of an antagonist is added to the cells with varying amounts of test 
compound to generate a dose response curve. If the compound is an antagonist, 
expression of luciferase is suppressed. 

Additional methods for the isolation of agonists and antagonist of HNF 
30 transcription fectors are described in U.S. Patent Nos. 6, 1 87,533 and 5,620,887. 

Additional U.S. patents describing methods to identify agents that modulate the activity 
of transCTiption factors include 5,804,374, and 5,298,429, and U.S. Patent Publication 
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Nos. 2004/0033942A1 2003/0077664, 2003/0215829 and 2003/0039980. Any of the 
methods described herein may be easily adapted to identify agonists or antagonists of 
any one of the HNF transcriptional factors. U.S. Patent No. 6,303,653 describes 
modulators of HNF-4 activity. 

, . _ ^ _ _ . 

Agonists and antagonists of HNF4a can also be designed based on the known 
crystal structure of HNF4a complexed with an endogenous fatty acid ligand (Dhe- 
Paganon, J. Biol. Chem. 277(41), 37973-37976). U.S. Patent Publication No. 
2002/0072587 describes methods of identifying agonists of an estrogen receptor, a 
nuclear receptor like the HNF proteins, based on its crj^tal stnicture. Such methods 
may easily be applied to HNF-la, HNF-4a and HNF6 by one skilled in tiie art. 
Additional examples of rational drug design based on the structure of a protein may be 
found in U.S. Patent or Pubhcation Nos. 6,236,946, 6,684,1 62, 2004/0014153, 
2003/0124699 , 20030077628, 2002/0151028, 2002/0072587 and 2003/021 1588. 

6. Therapeutics 

In one aspect, the invention provides methods of treating disease in a subject 
comprising the administration of a composition comprising a therapeutic agent 
"Therapeutic agenf ' or "therapeutic" refers to an agent capable of having a desired 
biological effect on a host. Chemotherapeutic and genotoxic agents are examples of 
therapeutic agents that are generally known to be chemical in origin, as opposed to 
biological, or cause a therapeutic effect by a particular mechanism of action, 
respectively. Examples of therapeutic agents of biological origin include growth 
factors, hormones, and cytokines. A variety of therapeutic agents are known in the art 
and may be identified by their effects. Certain therapeutic agents are capable of 
regulating cell proliferation and differentiation. Examples include chemotherapeutic 
nucleotides, drugs, hormones, non-specific (non-antibody) proteins, oligonucleotides 
(e.g., antisense ohgonucleotides that bind to a target nucleic acid sequence (e.g., mRNA 
sequence)), peptides, and peptidomimetics. 

In one embodiment, the compositions are pharmaceutical compositions. 
Pharmaceutical compositions for use in accordance with the present invention may be 
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formulated in conventional manner using one or more physiologically acceptable 
carders or excipients. Thus, the compounds and their physiologically acceptable salts 
and solvates may be formulated for administration by, for example, by ara-osol, 
intravenous, oral or topical route. The administcation may comprise intralesional, 
5 intraperitoneal, subcutaneous, intramuscular or intravenous injection; infusion; 
liposome-mediated delivery; topical, intrathecal, gingival pocket, per rectum, 
intrabronchial, nasal, transmucosal, intestinal, oral, ocular or otic delivery. 

An exemplary composition of the invention comprises an compound capable of 
10 modulating the expression or activity of a transcriptional regulator with a delivery 
system, such as a liposome system, and optionally including an acceptable excipient. 
In a preferred embodiment, the composition is formulated for injection. 

Techniques and formulations generally may be foimd in Remmington's 
15 Pharmaceutical Sciences, Meade Publishing Co., Easton, PA. For systemic 
administration, injection is preferred, including intramuscular, intravenous, 
intraperitoneal, and subcutaneous. For injection, the compovmds of the invention can 
be formulated in liquid solutions, preferably in physiologically compatible buffers such 
as Hank's solution or Ringer's solution. In addition, the compounds may be formulated 
20 in solid form and redissolved or suspended immediately prior to use. Lyophilized 
forms are also included. 



For oral administration, the pharmaceutical compositions may take the form of, 
for example, tablets or capsules prepared by conventional means with pharmaceutically 

25 acceptable excipients such as binding agents (e.g., pregelatinised maize starch, 
polyvinylpyrroUdone or hydroxypropyl methylcellulose); fillers (e.g., lactose, 
microcrystalline cellulose or calcium hydrogen phosphate); lubricants (e.g., magnesium 
stearate, talc or siUca); disintegrants (e.g., potato starch or sodium starch glycolate); or 
wetting agents (e.g., sodium lauryl sulphate). The tablets may be coated by methods 

30 well known in the art. Liquid preparations for oral administration may take the form 
of, for example, solutions, syrups or suspensions, or they may be presented as a dry 
product for constitution with wata- or other suitable vehicle before use. Such liquid 
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preparations may be prepared by conventional means with phaimaceutically acceptable 
additives such as suspending agents (e.g., sorbitol S5Tup, cellulose derivatives or 
hydrogenated edible &ts); emulsifying agents (e.g., lecithin or acacia); non-aqueous 
vehicles (e.g., ationd oil, oily esters, ethyl alcohol or- fractionated vegetable oils); and 
5 preservatives' (e.g., methyl or propyl-p-hydroxybenzoates or sorbic acid). The 

preparations may also contain buffer salts, flavoring, coloring and sweetening agents as 
appropriate. 

Preparations for oral administration may be suitably formulated to give 
10 controlled release of the active compound. For buccal administration the compositions 
may take the form of tablets or lozenges formulated in conventional manner. For 
administration by inhalation, the compounds for use according to the present invaition 
are convenientty delivered in the form of an aerosol spray presentation from 
pressurized packs or a nebuliser, with the use of a suitable propeUant, e.g., 
15 dichlorodifluoromethane, tricMorofluoromcthane, dichlorotetrafluoroethane, carbon 
dioxide or other suitable gas. In the case of a pressurized aerosol the dosage unit may 
be determined by providing a valve to deliver a metered amount. Capsules and 
cartridges of e.g., gelatin for use in an inhaler or insufflator may be formulated 
containing a powder mix of the compound and a suitable powder base such as lactose 
20 or starch. 

The compounds may be formulated for parenteral administration by injection, 
e.g., by bolus injection or continuous infusion. Formvdations for injection may be 
presented in xmit dosage form, e.g., in ampoides or in multi-dose containers, with an 
25 added preservative. The compositions may take such forms as suspensions, solutions 
or emulsions in oily or aqueous vehicles, and may contain formulatory agents such as 
suspending, stabilizing and/or dispersing agents. Alternatively, the active ingredient 
may be in powder form for ccmstitution with a suitable vehicle, e.g., sterile pyrogen- 
fiee water, before use. 

30 

The compounds may also be formulated in rectal compositions such as 
suppositories or retention enemas, e.g., containing conventional siq)pository bases such 
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as cocoa butter or otiier glycerides. 

In addition to the formulations described previously, the compounds may also 
be formulated as a depot preparation. Such long acting formulations may be 
5 administered by implantation (for example subcutaneously or intramuscularly) or by 
intramuscular injection. Thus, for example, the compounds may be formulated with 
suitable polymeric or hydrophobic materials (for example as an emulsion in an 
acceptable oil) or ion exchange resins, or as sparingly soluble derivatives, for example, 
as a sparingly soluble salt. 

10 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal administration, penetrants appropriate to the barrier to be 
permeated are used in the formulation. Such penetrants are generally known in the art, 
and include, for example, for transmucosal administration bile salts and fusidic acid 
15 derivatives, in addition, detergents may be used to facilitate permeation. Transmucosal 
administra.tion may be through nasal sprays or using suppositories. For topical 
administration, the oligomers of flie invention are formulated into ointments, salves, 
gels, or creams as generally known in the art. A wash solution can be used locally to 
treat an injury or inflammation to accelerate healing. 

20 

The compositions may, if desired, be presented in a pack or dispenser device 
which may contain one or more unit dosage forms containing the active ingredient. 
The pack may for example comprise metal or plastic foil, such as a blister pack. The 
pack or dispenser device may be accompanied by instructions for administration. 

25 

For therapies involving the administration of nucleic acids, the oligomers of the 
invention can be formulated for a variety of modes of administration, including 
systemic and topical or localized administration. Techniques and formulations 
generally may be found in Remmington's Pharmaceutical Sciences, Meade Publishing 
30 Co., Easton, PA. For systemic administration, injection is preferred, including 

intramuscular, intravenous, intraperitoneal, intranodal, and subcutaneous for injection, 
the oligomers of the invention can be formulated in Uquid solutions, preferably in 
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physiologically compatible buffers such as Hank's solution or Ringer's solution. La 
addition, the oligomers may be formulated in solid form and redissoived or suspended 
inamediately prior to use. Lyophilized forms are also included. 

5 - - Systeiiuc- administration can also be by transmucosal or transdermal means, or 
the compounds can be administered orally. For transmucosal or transdermal 
administration, penetrants appropriate to the barrier to be permeated are used in the 
formulation. Such penetrants are generally known in the art, and include, for example, 
for transmucosal administration bile salts and fiisidic acid derivatives. In addition, 
1 0 detergents may be xised to facilitate permeation. Transmucosal administration may be 
through nasal sprays or using suppositories. For oral administration, the oligomers are 
formulated into conventional oral administration forms such as capsules, tablets, and 
tonics. For topical administration, oligomers may be formulated into ointments, salves, 
gels, or creams as generally known in the art. 

15 

Toxicity and therapeutic ef&cacy of the agents and compositions of the present 
invention can be determined by standard pharmaceutical procedures in cell cultures or 
experimental animals, e.g., for determining the LD50 (the dose lethal to 50% of the 
population) and the ED50 (the dose therapeutically effective in 50% of the population). 

20 The dose ratio between toxic and therapeutic effects is the therapeutic index and it can 
be expressed as the ratio LD50/ED50. Compounds which exhibit large therapeutic 
induces are preferred. While compounds that exhibit toxic side effects may be used, 
care should be taken to design a delivery system that targets such compounds to the site 
■ of affected tissue in order to minimize potential damage to uninfected cells and, 

25 thereby, reduce side effects. 

The data obtained fix>m the cell cvilture assays and animal studies can be iised in 
formulating a range of dosage for use in humans. The dosage of such compounds lies 
preferably within a range of circulating concentrations that include the ED50 with little 
30 or no toxicity. The dosage may vary within this range depending upon the dosage form 
employed and the route of administration utilized. For any compound used in the 
method of the invention, the therapeutically effective dose can be estimated initially 
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from cell cixlture assays. A dose may be formulated in animal models to achieve a 
circulating plasma concentration range that includes the IC50 (i e., the concentration of 
the test compound which achieves a half-maximal inhibition of symptoms) as . 
determined in cell culture. Such information can be used to more accurately determine 
5 use&i'doserm fiimMaasT'I^elsi high 
performance liquid chromatogr^hy. 

In one embodiment of the methods described herein, the effective amoxmt of the 
agent is between about Img and about 50mg per kg body weight of the subject. In one 

1 0 embodiment, the effective amount of tbe agent is between about 2mg and about 40mg 
per kg body weight of the subject. In one embodiment, the effective amount of the 
agent is between about 3mg and about 30mg per kg body weigjit of the subject. In one 
embodiment, the effective amount of the agent is between about 4mg and about 2Qmg 
per kg body weigJit of the subject. In one embodiment, the effective amount of the 

15 agent is between about 5mg and about lOmg per kg body weight of the subject. 

In one embodiment of the methods described herein, the agent is administered 
at least once per day. In one embodiment, the agent is administered daily. In one 
embodiment, the agent is administered every other day. In one embodiment, the agent 
20 is administered every 6 to 8 days. In one embodiment, the agent is administered 
weekly. 

As for the amount of the compound and/or agent for administration to the 
subject, one skUled in the art would know how to determine the appropriate amount. 

25 As used herein, a dose or amount would be one in sufficient quantities to either inhibit 
the disorder, treat the disorder, treat the subject or prevent the subject from becoming 
aflElicted with the disorder. This amount may be considered an effective amount. A 
person of ordinary skill in the art can perform simple tifration experiments to determine 
what amount is required to treat the subject. The dose of the composition of the 

30 invention will vary depending on the subject and upon the particular route of 

administration used. In one embodiment, the dosage can range from about 0.1 to about 
100,000 ug/kg body weight of the subject. Based upon the composition, the dose can be 
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delivered continuously, such as by continuous pump, or at periodic intervals. For 
example, on one or more separate occasions. Desired time intervals of multiple doses of 
a particular composition can be determined without undue experimentation by one 
skilled in the art. 

The effective amount may be based upon, among other things, the size of the 
compound, the biodegradabiKty of the compound, the bioactivity of the compound and 
the bioavailability of the compound. If the compound does not degrade quickly, is 
bioavailable and highly active, a smaUer amount will be required to be effective. The 
effective amount will be known to one of skill in the art; it wiU also be dependent upon 
the form of the compound, the size of the compound and the bioactivity of the 
compound. One of skill in the art could routinely perform empirical activity tests for a 
compound to determine the bioactivity in bioassays and thus determine the effective 
amount. In one embodiment of the above methods, the effective amount of the 
compound comprises from about 1.0 ng/kg to about 100 mg/kg body weight of the 
subject. In another embodiment of the above methods, the effective amount of the 
compound comprises from about 100 ng/kg to about 50 mg/kg body weight of the 
subject. In another embodiment of the above methods, the effective amount of Ifae 
compound comprises from about 1 ug/kg to about 10 mg/kg body weight of the subject. 
In another embodiment of the above methods, the effective amount of the compound 
comprises from about 100 ug/Tkg to about 1 mg/kg body weight of the subject. 

As for when the compound, compositions and/or agent is to be administered, 
one skilled in the ait can determine when to administer such compound and/or agent. 
The administiration may be constant for a certain period of time or periodic and at 
specific intervals. The compound may be delivered hourly, daily, weekly, monthly, 
yearly (e.g. in a time release form) or as a one time delivery. The delivery may be 
continuous dehvery for a period of time, e.g. intravenous delivery. In one embodiment 
of the methods described herein, the agent is administered at least once per day. In one 
embodiment of the methods described herein, the agent is administered daily, hi one 
embodiment of the methods described herein, the agent is administered every other day. 
In one embodiment of the methods described herein, the agent is administered every 6 
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to 8 days. In one embodiment of tiie methods described herein, the agent is 
administered weekly. 

EXEMPLIFICATION 

The invention now being generally described, it will be more readily understood 
by reference to the following examples, which are included merely for purposes of 
illustration of certain aspects and embodiments of the present invention, and are not 
intended to limit the invention, as one skilled in the art would recognize fiom the 
teachings hereinabove and the following examples, tihat other DNA micaroairays, 
transcriptional regulators, cell types, antibodies, ChIP conditions, or data analysis 
methods, all without limitation, can be employed, wiliiout departing from the scope of 
the invention as claimed. 

The practice of the present invention will employ, where appropriate and unless 
otherwise indicated, conventional techniques of cell biology, cell culture, molecular 
biology, transgenic biology, microbiology, virology, recombinant DNA, and 
immunology, which are within the skill of the art. Such techniques are described in the 
hterature. See, for example, Molecular Cloning: A Laboratory Manual, 3rd Ed., ed. by 
Sambrook and Russell (Cold Spring Harbor Laboratory Press: 2001); the treatise, 
Methods In Enzymology (Academic Press, Inc., N.Y.); Using Antibodies, Second 
Edition by Harlow and Lane, Cold Spring Harbor Press, New York; 1 999; Current 
Protocols in CeU Biology, ed. by Bonifacino, Dasso, Lippincott-Schwartz, Harford, and 
Yamada, J ohn Wiley and Sons, Mc, New York, 1 999; and PGR Protocols, ed. by 
Bartlett et al., Humana Press, 2003. 

Various pubUcations, patents, and patent pubUcations are cited throughout this 
application the contents of which are incorporated herein by reference in their entirety. 

Experimental procedures 

The following procedures were followed in performing the experiments below: 
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Genome-scale Location Analysis 

The protocol described here was adapted from Ren 2001 . Briefly, cells are fixed 
withl% final concentration fonnaldehyde for 10-20 minutes at room temperature, 
harvested and rinsed with Ix PBS. The resultant cell pellet is sonicated, and DNA 
5 fragments that are criosslinked to^aprofein of interest are enriched by 

immunoprecipitation with a factor specific antibody. After reversal of the crosslinking, 
the enriched DNA is amplified using ligation-mediated PGR (LM-PCR), and then 
fiuorescently labeled using high concentration Klenow polymerase and a dNTP- 
fluorophore. A sample of DNA that has not been enriched by immunoprecipitation is 

1 0 subjected to LM-PCR and labeled with a different fluorophore. Both IP-enriched and 
xmenriched pools of labeled DNA are hybridized to a single DNA microarray 
containing 13,000 human intergenic regions (see below for description of DNA , 
microarray and binding site detennination).For hepatocyte experiments, 2.5 x 107 
hepatocytes were typically used per chromatin immimoprecipitatioiL These hepatocytes 

1 5 were isolated by standard liver perfiision techniques, itmnediately crosslinked with 1% 
formaldehyde solution, rinsed, and flash frozen. Islet preparations were freated with 
formaldehyde between 1 hour and 5 days after isolation from pancreata. A rninimxim of 
30,000 viable islet equivalents (approximately 2x 10^ beta ceUs) were fixed and 
handled as described above. Typical islet purity for three experiments described here 

20 was >70% islets with >80% viabihty. HNF4a, HNF6, and RNA polymerase II 

produced high quality results with as few as 30,000 islet equivalents. HNFla ChIP 
required significantly more material, typically 80,000 islets, to produce results with 
somewhat lower enrichment ratios than the results obtained with hepatocytes. 

25 Htmian 13K DNA Microarray 

It would be ideal to have a DNA microarray that contains the entire human 
genome sequence, but technical limitations and cost led ^>plicants to select the most 
relevant portion of the genome for inclusion in this microarray. Because a significant 
percentage of franscriptional binding sites in proximal promoters are within 1 kb of 

30 franscription start sites, applicants designed primers to amplify these genomic regions 
for printing onto a promoter array. Applicants selected 15000 cDNAs &aw. the NCBI 
RefSeq database, and mapped them to NCBI Bmld 22 (April 2001) of the hmnan 
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genome using BLAST. Where multiple splice variants had been described, applicants 
used the most upstream site, and verified the 5'-end by ahgnment with the Database of 
Transcriptional Start Sites (http://elmo.ims.utokyo.ac.jp/dbtss/). Sequences to be 
amplified were extracted jfrom the genomic region-750 bp to +250 bp relative to this 
5 ~1faiiscripti6nal"start site. To control for nonspecific binding, 9 amplified regions derived 
from long Arabidopsis open reading fram^ were included on the array. As a further 
negative control and for use in data normalization, applicants chose 158 ORF regions 
within long exons of himian genes for amplification. To prepare the DNA content of 
the arrays, the program PrimerS 

10 (http://wwwgenome.wi.niit.edu/genome_soflware/otiber/ primers .htinl) was iised to 
design primers using the sequences described above. PCRs were performed on these 
primer set using standard conditions, except for the presence of 1 M betaine in all PGR 
reactions. Betaine was empirically observed to increase the success rate of the 
amphfication reactions. 

15 - 

Of the 13,000 PGR pairs, 70% gave a strong band of the appropriate size, as 
verified on 2% agarose gels. Applicants have noted, however, that PGR products 
imdetectable by agarose EtBr gel analysis can give valid positive signals when 
concentrated and printed on the DNA arrays. PGR quality evaluations were performed 

20 on the BRIDNAsuite of programs from the Biotechnology Research Institute of the 
National Research Gouncil of Canada (http://www.irb-bri.cnrc-nrc.gc.ca/).PGR 
products were recovered from the reaction mixture by ammonium acetate/isopropanol 
precipitation and resuspended into 3x SSC with 1.5 M betaine to minimize evaporation 
and improve spot quality. Applicants printed amplified products onto GAPS-coated 

25 glass slides (Goming) using a Cartesian PixSjre 5500 airayer. The quality of the arrays 
was determined on a batch-wise basis by hybridization with sequence neutral 
ohgonucleotides covalently linked to Cy3 or Gy5, followed by calculation of usable 
percentage of spots, combined with direct visual inspection of the quality of the chip. 
The Hul3K array was remapped post-production using two independent methods. First, 

30 applicants performed electronic PGR on the primer sets against the August 2003 final 
release of the completed human genome. Second, applicants BLASTed the sequence 
used to extract primers for amplification against the August 2003 final release of the 
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human geaome. The dataset downloaclable from the si^jporting website reports the 
location of each arrayed promoter relative to the transaiptional start site. 

Data Qualit y Control 

5 - 1 r.hTP Pyhrit^ization OuaUtv Control " 

The raw data generated from each array experiment was subjected to mxiltiple 
levels of quahty control. First, each scan was examined visually as it was being 
performed. Samples on microatrays with gross defects (e.g. scratches, smeared spots) 
were repeated whenever possible. Applicants also determined that no rehable signal 

10 was produced from control spots containing Arabidopsis DNA. 

2. Bint ^in p Site DRtRrminarion and Error Model 
Scanned images were analyzed using GenePix (v3 . 1 or v4.0), to obtain 
background subtracted intensity values. Each spot is bound by both IP-enriched and 
1 5 unenriched DNA, which are labeled with different fluorophores. Consequently, each 
spot yields fluorescence intensity information in two charaiels, corresponding to 
iimnunoprecipitated DNA and genomic DNA. To account for background hybridization 
to slides, the median intensity of a set of control blank spots was subtracted for site- 
specific transcription factors (e.g. HNFla), and the median intensity for a set of control 
20 ORF spots was subtracted for broadly acting DNA binding proteins (e.g. KNA Pol H, 
HNF4a). To correct for different amounts of genomic and immunoprecipitated DNA 
hybridized to the microarray, the median intensity value of the IP-enriched DNA 
channel was divided by the median of the genomic DNA channel, and this 
normalization factor was appUed to each intensity in the genomic DNA channel. Next, 
25 applicants calculated the log of the ratio of intensity in the IP-enriched channel to 

intensity in the genomic DNA channel for each intergenic region across the entire set of 
hybridization experiments. Adjusted mtensity values for the IP-enriched channel were 
calculated from these ratios. A whole-chip error model (Hughes 2000; Lee 2002) was 
then used to calculate confidence values for each spot on each microarray, and to 
30 combine data for the repUcates of each experiment to obtain a final average ratio and 
confidence for each promoter region. Genes were included in the set of 'bound' genes 
if the binding P-value in the error model was < 0.001 or enrichment was at least 2-fold 
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in the iminiinoprecipitation. 

Confinnation of Predicted Binding 

Hie accuracy of genome-wide location data reported here has been assessed 
5 using several approaches. ~ " " ' ' " — 

1 Kstimatifin nf False Positive Rates Using Conventional ChIP Ex pRrimRnts 
Conventional, independent ChIP experiments conducted ia our laboratory at a 
gene specific level have confirmed over 100 binding interactions identified by location 
10 analjflsis data involving 6 different regulators (see 

http://web.wi.mit.edu/young/pancregulators). These results suggest that our empirical 
rate of false positives is at most 16%. This rate is somewhat higher than that found for a 
large scale survey of yeast transcription factors (Lee 2002), which probably reflects the 
greater complexity of the human genome. Figures 9 and 10 show typical verification 
1 5 Chip experiments for HNF4a and HNFl a, respectively, in hepatocytes. 
2. Comparison with Previous Literature 

Applicants found no previous studies of the genomic targets of transcriptional 
regulators in primary human tissue. However, a large number of HNFl a and HNF4a 
targets have been identified in model organisms and human carcinoma (mostly 

20 hepatoma) cell lines; these targets are summarized in Figure 14. For example, genome- 
scale location analysis identified 30 of the 68 hqjatocyte genes which were both 
previously suggested to be targets of HNF4a, and included on the 13K DNA array. 
Similarly, genome-scale location analysis identified 21 of the 81 hepatocyte genes 
which were both previously suggested to be targets of HNF4a, and included on the 13K 

25 DNA array. Discrepancies between the targets reported here and targets reported in the 
literature may result from a number of factors, which include, but are not limited to: (1) 
the limitations of using a 1 kb promoter firagment to probe the binding of a transcription 
factor, (2) the stringency of our threshold criteria, (3) the differences between the 
regulatory network in model organisms and/or cell lines, and the regulatory network in 

30 primary human tissue, (4) differences between indirect technologies in the Uterature 
(i.e. gel-shift and transient transfections) and genome-scale location analysis, (5) tissue 
isolation effects, among others. A more comprehensive discussion can be found at 
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http://web.wi.initedu/young/pancregulators 

Regulatory Motifs Derived from Binding Data 

In order to discover network motifs, two data matrices were created. The overall 
5 matrix D consists' of biliary entries Dij, where a 1 indicates binding of regulator j to 

intergenic region i, a 0 indicates no binding event. The regulator matrix R is a subset of 
D, containing only the rows corresponding to the intergenic region assigned to each 
regulator, in the same order as the columns of regulators. All analj^es were performed 
in Matlab. TTie algorithms used to find each motif are-described below. Autoregulatory 

1 0 motif: Find each non-zero entry on the diagonal of R Feedforward loop: For each 
master regulator (column of R), find non-zero entries, which correspond to regulators 
bound. For each master regulator / secondary regulator pair, find all rows in D bound 
by both regulators. Multi-component loop: For each regulator (column of R), find the 
regulators to which it binds. For each of these, find the regulators it binds. If any of 

1 5 these are the original regulator, you have a multi-component loop of two. For all others, 
find regulators to which they bind. If any of these are the original, you have a 
multicomponent loop of fljree. Repeat to find larger loops. Single input module: Find 
the intergenic regions bound by only one regulator. That is, take the subset of rows of D 
such that the sxmi of each row is 1. Then for each regulator (colunm), find non-zero 

20 entries. Each set (greater than three intergenic regions) is a SIM. Multi-input module: 
Find the intergenic regions bound by more than one regulator. That is, take the subset 
of rows of D such that the sum of each row is greater than 1. Then, for each row, find 
any other row bound by the same regulators. The collection of rows bound by the same 
regulators correspond to a MIM. Once a row is assigned to a MIM, remove it from 

25 ftuther analysis. Regulator chain: For each regulator (column of R), use a recursive 
algorithm to find chains of all lengths. That is, for each regulator whose promoter is 
bound by the regulator before it in the chain, find the regulator promoters to which it 
binds. Repeat until the chain ends. There are three possible ways to end a chain: a 
regulator that does not bind to the promoter of any other regulator, a regulator that 

30 binds to its own promoter, or one tiiat binds to flie promoter of another regulator earlier 
in the chain. 



-52- 



wo 2005/054461 



PCT/US2004/039805 



Exam ple 1 

The Uver and pancreas have long been the subject of studies to understand how 
organs develop and are regulated at the transcriptional level {8-12). Tl.e transcriptional 
regulators HNFla (a homeodomain protein), HNF4a (a nuclear receptor) and HNF6 (a 
5 raember of the onecut fennly) operate coop^tively in a connected network in liie liv«:, 
but less in known about the stmcture of this regulatory network in human pancreatic 
islets. All three transcriptional regulators are required for normal ftoction of hver and 
pancreatic islets {13-18). Mutations in HNFla and HNF4a are the causes of the type 3 
and type 1 forms of maturity-onset diabetes of the young (M0DY3 and MODYl). a 
10 genetic disorder of the insuhn-secreting pancreatic beta cells characterized by onset of 
diabetes melUtus before 25 years of age and an autosomal dominant pattern of 
inheritance (19). 

Applicants hypothesized that genome-scale analysis of the pancreatic islet genes 
15 whose expression is regulated by these transcription factors in normal beta cells could 
provide insights into the molecular basis of the abnormal beta cell function that 
characterizes MODY. Applicants have identified the genes occupied by the 
transcription factors HNFlo; HNF4a, and HNF6 in pancreatic islets. The genes 
transcribed in each tissue were identified by determining the genomic occupancy of 
20 RNA polymerase n. AppUcants used this information to begin to map the 
transcriptional regulatory circuitry in these tissues. 

Applicants first used genome-scale location analysis (20) to identify the 
promoters bound by HNFla in human hepatocytes and pancreatic islets isolated from 
tissue donors (Fig lA). For each tissue, HNFla-DNA complexes were enriched by 
chromatin immunoprecipitation in three separate experiments. Applicants constmcted 
a custom DNA microarray containing portions of promoter regions of 13,000 human 
genes (Hul3K array). AppHcants targeted the region spanning 700 bp upstream and 
200 bp downstream of transcription start sites for the genes whose start sites are best 
characterized based on National Center for Biotechnology Information annotation (20). 
Although many enhancers are present at more distant locations, most known 
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transcription fector binding site sequences occur within these start-site proximal regions 
ofpromoters. 

( The results of these genome location experiments revealed that HNFla is 
5 bound to at least 222 target genes in hepatocytes, representing 1 .6% of the geiaes on the 
Hul3K array (Figure 1 1) {2U). This result was verified with independent, conventional 
chromatin immunoprecipitation experiments, which suggest that the frequency of false 
positives in genome-scale location data with gene-specific regulators is no more tiian 
1 6% when our threshold criteria were used (20). The genes J5>plicants found to be 
1 0 occupied by HHFl a in primary human hqiatocytes encode products whose functions 
represent a significant cross-section of hepatocyte biochemistry. The results confirm 
that HKFla contributes to the transcriptional regulation of many of the central rate- 
limiting steps in gluconeogenesis and associated pathways. HNFla also binds to genes 
whose products are central to normal hepatic function, including carbohydrate synthesis 
15 and storage, lipid metabohsm (synthesis of cholesterol and apoUpoproteins), 
detoxification (synthesis of cytochrome P450s) and synthesis of serum proteins 
(albumin, complements and coagulation factors). 

Applicants next identified HNFl a target genes in human pancreatic islets 
20 (Figure 1 1) (20). HNFla occupied the promoter regions of 106 genes (0.8% of the 
Hul3K array promoters) in islets, 30% of which were also bound by HNFla in 
hepatocytes (Figure IB). In islets, fewer chaperones and enzymes are bound by HNFla 
than ia hepatocytes, and the receptors and signal transduction machinery regulated by 
HNFla vary between the two tissues. 

25 

HNFl a has been previously implicated in the regulation of many genes in 
hepatocytes and islets {13. 16. 20 [Figure 15]). The direct genome binding data 
reported here confirmed many, but not all, of these genes. The difference may be due, 
at least in part, to our stringent criteria for binding in the genome-scale data, which 
30 enhances our confidence in the direct target genes identified by location analysis, but 
hkely underestimates the actual number of targets in vivo. Furthermore, although the 
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proximal promoter regions printed on the array contain a significant number of 
transcription factor binding sequences, many genes are also regulated by more distal 
promoter elements and enhancers that are not present on the Hul3K array. 

Applicante also identified the promoters bound by HNF6 in human hqiatocytes 
and pancreatic islets using genome-scale location analysis (Fig IB; Figures 16 and 17) 
(20). HNF6 was bound to at least 222 genes in hepatocytes and 189 genes in pancreatic 
islets, representing 1.7% and 1 .4% of the promoters on the array, respectively. 
Approximately half of the promoters occupied by H1SIF6 were common to the two 
tissues, and included a number of important cell cycle regulators siich as CDK2 (20). 

Genome-scale location analysis revealed surprising results for HNF4a in 
hepatocytes and pancreatic islets (Fig IB). The number of genes enriched in HNF4a 
chromatin immimoprecipitations was much larger than observed with typical site- 
specific regulators. HNF4a was bound to approximately 12% of the genes represented 
on the Hul 3K DNA microarray in hepatocytes and 1 1 % in pancreatic islets. No other 
transcription factor appUcants have profiled in human cells has been observed to bind 
more than 2.5% of the promoter regions represented on the 13K array. 

Six independent lines of evidence indicate that the HNF4a results are not due to 
poor antibody specificity or errors in fl;e microarray analysis, and siqyport the view that 
HNF4a is associated with an unusually large number of promoters in hepatocytes and 
pancreatic islets (20). First, essentially identical results were obtained with two 
different antibodies that recognize different portions of HNF4a. Second, Western blots 
showed that the HNF4a antibodies are highly specific. Third, apphcants verified 
bmding at over 50 randomly selected targets of HNF4a in hepatocytes by conventional 
gene-specific chromatin inmiunoprecipitation. Fourth, when antibodies against HNF4a 
were used for ChIP in control experiments with Jurkat, U937, and BJT cells (which do 
not express HNF4a), no more than 17 promoters were identified in each cell hne by 
our criteria, which is well within the noise inherent in this system. Fifth, when pre- 
immune antibodies fi-om rabbit and goat (the two different anti-HNF4a antibodies 
came firom rabbit and goat) were used in control experiments in hepatocytes, the 
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number of targets identified was within the noise. Finally, if the HNF4a results are 
collect, then applicants would expect that the set of promoters bound by HNF4a should 
be largely a subset of those bound by RNA polymerase n in each tissue; applicants 
found that tins is the case (see below). Applicants conclude that HNF4a is a widely 
acting transcription factor in these tissues, consistent with the observation that it is an 
unusually abxmdant, constitutively active transcription factor (II). 

Applicants next identified the genes represented on the Hul3K microarray that 
are actively transcribed in hepatocytes and pancreatic islets, so the Jfraction of actively 
transcribed genes that are bound by HNF4a could be determined (Fig 2C). It is 
difficult to determine accurately the transcriptome of these tissues by profiling 
transcript levels witii DNA microarrays. Transcript profiling requires a reference RNA 
population against which a tissue RNA population can be compared, and there are 
limitations to generating appropriate reference RNA. To circumvent this Limitation, 
applicants exploited the fact that RNA poljonerase 11 occupies the set of protein-coding 
genes that are actively transcribed in eukaryotic cells. Location analysis with RNA 
polymerase n antibodies can identify these actively franscribed genes (7, 21). 
AppUcants found that 23% of the genes on the Hul3K array (2984 genes) were bound 
by RNA polymerase n in hepatocytes, and 19% (2426 genes) were bound by RNA 
polymerase n in islets (20). The sets of genes occupied by RNA polymerase II in 
hepatocytes and islets overlapped substantially (81% overlap, relative to islets), 
consistent with the relatedness of the two tissues (22). As expected, the majority of 
genes occupied by HNF4a in hepatocytes and pancreatic islets (80% and 73%, 
respectively) were also occupied by RNA polymerase EE. Remarkably, of the genes 
occupied by RNA polymerase n, 42% (1262/2984) were bound by H]SrF4a in 
hepatocytes and 43% (1047/2426) were bound by HNF4a in islets (Fig IC). By 
comparison, only 6% and 2% of RNA polymerase U enriched promoters were also 
bound by HNFla in hepatocytes and islets, respectively. 

Previous studies indicate that HNFla, H>3F4a, and HNF6 are at the center of a 
network of tianscription factors that cooperatively regulate numerous developmental 
and metabotic fimctions in hepatocytes and islets (P, 13. 15. 17). Our systematic 
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analysis of the direct in vivo targets of these factors significantly expands our 
understanding of the regulatory network in primary human tissues (Fig 2A). A 
comparison of the regulatory network in these two tissues reveals that HNFla, HNF4a, 
and HNF6 occupy the promoters of genes encoding a large population of transcription 
5 factors and cof actors in the two tissues (20). The precise set of transcription factor 
genes occupied by HNFla, HNF4a, and HNF6, and the extent to which they are co- 
occupied by the HNF regulators, differed substantially between these two tissues. 

The transcription factor bmding data was used to identify regulatory network 

1 0 motife, simple units -of transcriptional regulatory network architecture that suggest 
mechanistic models (Fig 2B) (4, 23). Our data confirm previous reports that HNFla 
and HNF4a occupy one another's promoters in both hepatocytes and islets, forming a 
multi-component loop {24-26). Multicomponent loops provide the capacity for 
feedback control and produce bistable systems that can switch between two alternate 

15 states {23). It has been suggested that the multicomponent loop present between 
HNFla and HNF4a is responsible for stabilization of the terminal phenotype in 
pancreatic beta cells {2S). Applicants also found that HNF6 serves as a master 
regulator for feedforward motifs in hepatocytes and pancreatic islets involving over 80 
genes in each tissue (Figures 20 and 22). For example, in hepatocytes, HNF6 binds the 

20 HNF4a7 promoter, and HNF6 and HNF4a together bind PCKl, which encodes 

phosphoenolpyruvate carboxykinase, an enzyme key to gluconeogenesis (Fig 2B). A 
feedforward loop can act as a switch designed to be sensitive to sustained, rather than 
transient, inputs {23). HNFla, HNF4a and HNF6 were also found to form multi-input 
motife by collectively binding to sets of genes in hepatocytes and islets. This 

25 regulatory motif suggests coordination of gene e3q)ression through multiple input 

signals. Apphcants also found that HNF6, HNF4a, and HNFla form a regulator chain 
motif with THRA (NRIDI); regulator chain motifs represent the simplest circuit logic 
for ordering transcriptional events in a temporal sequence {4, 23). Additional examples 
of these regulatory motifs can be found in Figures 20 and 23 {20). Figures 20-24, 

30 panels A and B, show transcriptional regulators occiqjied by HNF transcription factors 
and their regulatory loops. Figures 4-10 show additional controls and data generated by 
the experiments described herein. 
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10 



Our results suggest that the nuclear hormone receptor HNF4a contributes to 
regulation of a large fraction of the liver and pancreatic islet transciiptomes by binding 
directlytoahnosthalfofthe actively transcribedgenes. This likely explains why 
m^4a is crucial for development and proper" functionof these tissues (i2-i5, 11. 18). 
Perhaps most importantly, our results suggest a mechanistic explanation for the recent 
discovery that polymorphisms in the islet-specific P2 promoter for liie splice variant 
HNF4a7 can greatly increase the risk of type H diabetes (.27-30). AppUcants found tiiat 
multiple HNF factors bind directly to the P2 promoter in primary, healtiiy human islets. 
Alterations in the binding sites for these factors could cause misregulation of HNF4a 
expression and thus its downstream targets, leading to beta ceU malfimction and 
diabetes. 
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Claims: 

1 . A method of detennining which genes from a subset of genes are regulated by a 
transcriptional regulator ejqjressed in a cell, the method comprising 

(a) selectively isolating chromatin from. the cell to generate isolated 
5 chrbroatiii"; 

(b) selectively isolating chromatin fragments from the isolated chromatin 
to generate boimd chromatin fragments, wherein the bound chromatin 
fragments are bound by the transcriptional regulator; 

(c) amplifying both the bound chromatin fragments to generate amplified 
10 chroniatin fragments and the isolated chromatin to generate amplified 

control chromatin; 

(d) hybridizing the amplified control chromatin and the amplified 
chromatin fragments to a DNA microarray, wherein the DNA 
microarray comprises 

15 (1) at least 10,000 experimental spots, each experimental spot 

comprising an experimental DNA, each experimental DNA 
comprising a promoter region from a gene in the subset; and 
(2) at least 100 control spots, each control spot comprising a 
control DNA, each control DNA comprising a non-promoter 

20 region; and 

(e) determining and comparing a hybridization signal at each of the spots 
on the microarray between those generated by 

(1) the amplified control chromatin; and 

(2) the amplified chromatin firagments; 

25 . wherein a gene in the subset is said to be regulated by the transcriptional 

regulator in the cell if a spot comprising a promoter region of said gene displays 
a higher level of hybridization by the amplified chromatin fragments than by 
the amplified control chromatin. 

30 2. The method of claim 1, wherein the level of hybridization of the amplified 
diromatin fragments to each experimental spot is normalized by the level of 
hybridization of the amplified chromatin fragments to the control spots. 
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3. The method of claim 1, wherein the level of hybridization of the amplified 
chromatin fragmoits to each experimental spot is normalized by subtracting the 
mean level of hybridization of the amplified chromatin fiagments to the control 

5 spots. 

4. The method of claim 1 , wherein the higher level of hybridization comprises at 
least a two-fold higher level of hybridization. 

10 5. The method of claim 1, wherein tiie transcriptional regulator is native to the 
cell. 

6. The method of claim 1, wherein the transcriptional regulator is not a 
recombinant transcriptional regulator. 

15 

7. The method of claim 1 , wherein the cell is a primary cell. 

8. The method of claim 7, wherein the cell is a human cell. 

20 9. The method of claim 8, wherein the cell is a transplant-grade human cell. 

10. The method of claim 1 , wherein step (b) comprises immunoprecipitation of the 
transcriptional regulator. 

25 1 1 . The method of claim 1 , wherein step (c) comprises ligation-mediated 
polymerase chain reaction (LM-PCR). 

12. The method of claim 1, wherein the promoter region of the gene comprises 
from at least 700bp upstream to at least 200 bp downstream of the 
30 transcriptional start site of the gene. 
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13. 



15 17. 



25 19. 



The method of claim 1, wherein the promoter region comprises at least 30, 40, 
50, or 60 or nucleotides in leagth. 



14. The method of claim I , wherein the promoter region of the gene comprises a 

5 sequence of at least 30 nucleotides whose sequence is identical to a region 

stretching from 3 kb iipstream to 1 kb downstream of the transcriptional start 
site of said gene. 

15. The method of claim 1, wherein the non-promoter region comprises an open 
10 reading frame. 

16. The method of claim 1 , wherein the transcriptional regulator is a basal 
transcription factor. 



The method of claim 16, wherein the transcriptional regulator is an RNA 
polymerase n or a TATA-binding protein. 



18. A method of identifying a transcriptional regulatory network in a cell, the 
method comprising determining if a transcriptional regulator regulates 
20 additional transcriptional regulators in the cell using the method of claim 1 , 

wherein a transcriptional regulatory network is identified if at least one 
additional transcriptional regulator is detemodned to be regulated by the 
transcriptional regulator. 



The method of claim 18, wherein the experimental DNA comprises promoter 
regions from the additional transcriptional regulators. 



20. A method of identifying a transcriptional regulatory network in a cell, the 
method comprising determining if a transcriptional regulator regulates 
30 (i) its own promoter; or 

(ii) a promoter from a plurality of transcriptional regulators, 
using the method of claim 1, wherein the experimental DNA comprises 
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(a) a pTomoter from the transcriptional regulator; and 

(b) promoters from the plurality of transcriptional regulators; 
wherein a transcriptional regulatory network is identified if the transcriptional 
regulator regulates itself or if it regulates at least one of the plurality of 
transcriptional regiilatorsr" 

21 . A method of identifying transcriptional regulatory networks in a cell, the 
method comprising 

(a) determining, by repeating the method of claim 1 for each of a plurality of 
transcriptional regulators, the genes in a subset which are regulated by 
each of the plurality of transcriptional regulators, wherein the 
experimental DNA comprises promoter regions for each of the plurality 
of transcriptional regulators; 

(b) determining if any one of the plurality of transcriptional regulators are 
regulated by at least one of the plurality of transcriptional regulators; 

wherein a transcriptional regulatory network is identified if any one of the 
plurality of transcriptional regulators is regulated by at least one of the plurality 
of transcriptional regulators. 

22. The method of claim 21, further comprising detemuning if a gene is regulated 
by more than one of the plurality of transcriptional regulators. 

23. A DNA microarray for determining promoter occupancy in a human ceU, the 
microarray comprising 

(1) at least 10,000 experimental spots, each experimental spot comprising an 
experimental DNA, eacdi experimental DNA compr^ing a promoter 
region from a human gene in the subset; and 

(2) at least 100 control spots, each control spot comprising a control DNA, 
each control DNA comprising a non-promoter region; 

wherein at least 75% of the promoter regions comprise from at least 700bp 
upstream to at least 200 bp downstream of the transcriptional start site. 
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24. A method of estimating if a transcriptional regulator is a global transcriptional 
regulator, the method comprising 

(a) selectively isolating chromatin firom a tissue; 

(b) identifying promoter regions from the chromatin which are boimd by 
5 ~ a (^didate globM transcriptional regulator; 

(c) identi^dng promoter regions &om the chromatin which are bound by 
a member of the basal transcriptional machinery; and 

(d) comparing the promoter regions identified in steps (b) and (c) to 
determine the ratio between (i) the number of promoter regions 

1 0 bound by both the candidate global transcriptional regulator and the 

member of the basal transaiptional machiuery; and (ii) the nimiber of 
promoter regions bound by the member of the basal transcriptional 
machinery 

wherein a transcriptional regulator is a global transcriptional regulator when the 
IS mtio is greater than 0.2. 

25. The method of claim 24, wherein steps (b) and (c) are performed using a DNA 
microarray. 



20 26. The method of claim 25, wherein the DNA microarray comprises 

(i) at leeist 10,000 experimental spots, each experimental spot comprising an 
experimental DNA, each experimental DNA comprising a promoter 
region from a human gene in the subset; and 

(ii) at least 100 confrol spots, each control spot comprising a control DNA, 
25 each control DNA comprising a non-promoter region; 

27. The melhod of claim 24, wherein the member of the basal transcriptional 
machinery is an RNA polymerase IE or a TATA-binding protein. 

30 28. The method of claim 24, wherein the tissue is transplant-grade tissue. 

29. The method of claim 24, wherein the tissue is fireshly-isolated human tissue. 
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30. The method of claim 29, wherein the tissue is from a subject afflicted with a 
disorder. 

5 31. The method of claim 30 wherein-the disorder is a hyperplastic condition. 

32. A method of identifying at least one target gene for the development of a 

therapeutic to treat or prevent a disorder in a subject, wherein at least one form 
of the disorder is caused by an altered activity in a transcriptional regulator or in 
10 a suspected transcriptional regulator, the method comprising 

(a) identifying the genes regulated by the transcriptional regulator in a cell; 

(b) determining if the transcriptional regulator is a broad-acting 
transcriptional regulator or a narrow-acting transcriptional regulator, 
wherein if the transcriptional regulator is a broad acting transcriptional 

1 5 regulator then the transcriptional regulator is a target gene for the 

development of a therapeutic, and wherein if the transcriptional 
regulator is a narrow acting transcriptional regulator then 

(i) determining if at least one gene regulated by the transcriptional 
regulator is likely causative in the disorder, wherein a gene that 

20 is likely causative in the disorder is a target gene for the 

development of a therapeutic; and 

(ii) reiterating steps (a) and (b) for at least one gene that is 
regulated by the transcriptional regulator in the cell and that 
either 

25 (1) encodes a transcriptional regulator or 

(2) is suspected to encode a transcriptional regulator, 
with the modification that the transcriptional regulator of steps (a) and 
(b) is said gene, 

thereby identifying at least one target gene for the development of a therapeutic 
30 to treat or prevent a disorder in the subject. 

33. The method of claim 32, wherein identifying the genes regulated by the 



-65- 



wo 2005/054461 



PCT/US2004/039805 



transcriptional regulator in a cell comprises chromosome-wide location 
analysis. 

34. The method of claim 32, wherein identifying the genes regulated by the 
5 transcriptional regulator in the"c^U'c6ffiprises~tKi5g"the metiidd of "claiifi"! . " 

3 5 . The method of claim 32, wherein the transcriptional regulator is a master 
regulatory gene. 

10 36. The method ofclaim 35, wherein the master regulatory gene is SOXl-18, 

OCT6, PAX3, Myocardin, GATAl-6, TCFI/HNFIA, HNF4A, HNF6, NGN3, 
C/EBP, FOXAl-3, IPFl, GATA, HNF3, NEDQ.l, CDX, FTF/NR5A2, 
C/EBPbeta, SCLl, SKINl, or a member of the neurogenin, LK, IMO, SOX, 
OCT, PAX, GATA or MyoD family of transcription factors. 

15 

37. The method of claim 32, wherein tiie transcriptional regulator is PAX3, EGR-1 , 
EGR-2, OCT6, a SOX family member, a GATA family member, a PAX family 
member, an OCT family member, RFX5, WHN, GATAl, VDR, CRX, CBP, 
MeCP2, AMLl, p53, PLZF, PML, Rb, WTl, NR3C2, GCCR, PPARgamma, 

20 . SIMl, HNFlalpha, HNFlbeta, HNF4a^)ha, PDXl, MAFA, F0XA2, or 

NEURODl. 

38. The method of claim 32, wherein the cell is derived ftom a tissue whose 
function is impaired in the disorder. 

25 

39. The method of the claim 32, wherein the broad acting gene regulates at least 
about 2.5% of the genes in the cell, and wherein the narrow acting gene 
regulates less than about 2.5% of the genes in the cell. 

30 40. The method of claim 32, wherein the gene is suspected to encode a 

transcriptional regulator if it shares at least 30% amino acid sequence identity 
with the DNA binding domain of a transcriptional regulator. 
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The metiiod of claiin 32, wherein the transcriptional regulator in the cell is a 
mutant transcriptional regulator. 

The method of claim 32, wherein the transcriptional regulator in-the cell-has — 
altered activity. 

The method of claim 32, wherein flie gene regulated by the transcriptional 
regulator is likely causative of the disorder when a mutation in the gene results 
in at least one phenolype or symptom associated with the disorder. 

The method of claim 32, wherein the gene regulated by the transcriptional 
regulator is likely causative of the disorder when the gene encodes an enzyme 
or signaling molecule which functions in a pathway that is impaired in the 
disorder. 

The methdd of claim 32, wherein the altered activity in the transcriptional 
regulator comprises at least one of the following: 

(a) an alteration in the binding affinity of the transcriptional regulator to 
DNA; 

(b) an alteration in the ability of the transcriptional regulator to bind to 
RNA polymerase, to an RNA polymerase hoioenzyme, or to a second 
transcriptional regulator; 

(c) an alteration in the binding afSnity of the transcriptional regulator to a 
ligand; 

(d) an alteration in expression level or expression pattern of the 
transcriptional regulator; or 

(e) an alteration in an ability of the transcriptional regulator to form 
homomultimers or heteromultimers. 

The method of claim 32, wherein the disorder is characterized by unpaired 
function of at least one of the following: brain, spinal cord, heart, arteries, 
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esophagus, stomach, small intestiiie, large mtestine, liver, pancreas, lungs, 
kidney, urinary tract, ovaries, breasts, uterus, testis, penis, colon, prostate, bone, 
muscle, cartilage, thyroid gland, adrenal gland, pituitary, bone marrow, blood, 
thymus, spleen, lymph nodes, skin, eye, ear, nose, teeth or tongue. 

47. The method of claim 32, wherein the therapeutic comprises a small molecule 
drug, an antisense reagent, an antibody, a peptide, a ligand, a fatty acid, a 
hormone or a metabolite. 

10 48. The method of claim 32, wherein the subject is a mammal. 

49. The method of claim 48, wherein the mammal is a human. 

50. The method of claim 32, wherein the transcriptional regulator is a 
15 transcriptional activator or a transcriptional repressor. 

5 1 . The method of claim 32, wherein the transcriptional regulator is nadve to the 
ceU. 

20 52. The method of claim 32, wherein the transcriptional regulator is from a species 
different from that of the cell. 

53. The method of claim 52, wherein the transcriptional regulator is a viral 
transcriptional regulator. 

25 

54. A method of treating or preventing type n diabetes in a subject, comprising 
administering to the subject a therapeutically effective amoimt of an agent that 
increases the global transcriptional activity of HNF4alpha. 

30 55. A method of treating or preventing a disorder associated with low 

transcriptional activity of BDSIF4alpha in a subject, comprising administering to 
the subject a therapeutically effective amount of an agent that increases the 
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global transcaiptional activity of HNF4alpha. 

56. A method of treating or preventing a disorder associated with high 
transcriptional activity of HNF4alpha in a subject, comprising administering to 

5 the subject a therapeutically effective amount of an agent that decreasesthe - - 

global transcriptional activity of HNF4alpha. 

57. A method of increasing the global transcriptional activity in a Uver or a 
pancreatic cell comprising contacting the cell with an agent which increases the 

10 global transcriptional activity of HNF4alpha. 

58. A method of decreasing the global transcriptional activity in a liver or a 
pancreatic cell comprising contacting tiie cell with an agent which decreases the 
global transcriptional activity of HNF4a]pha. 

15 

59. A method of regulating the expression level of any one of the genes in Figure 

13 in a hepatocyte, the method comprising contacting the cell with an agent 
which regulates the transcriptional activity of HNFl alpha. 

20 60. A method of regulating the expression level of any one of the genes in Figure 

14 in a pancreatic cell, the method comprising contacting the cell with an agent 
which regulates the tianscriptional activity of HNTFl alpha. 

61. A method of regulating the expression level of any one of the genes in Figure 
25 1 6 in a hepatocyte, the method comprising contacting the cell with an agent 

which regulates the transcriptional activity of HNF6. 

62. A method of regulating the expression level of any one of the genes in Figure 
17 in a pancreatic cell, the method comprising contacting the cell with an agent 

30 which regulates the transcriptional activity of HNF6. 

63 . A method of regulating the ejqpression level of any one of the "genes in Figure 
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wUctr^gulatestotranwiptioiial adivifl, of HNF4alph.. 

64 Ame*odofr.guUtingthe«prearionl«dof=n,on.o£fb=gene».mFigure 
,9 in a panorealic c=U, te mcfc<,d-oan,msb«-«««^^ 
.whidi regulaxed te tianscriptioMl artivity otHNF4alph.. 



65. 



A method of ideattfying ttansoripticmaUy active g«i« that are regulated by a 
ttaiKcriptional regulator m a eeU, Bie method eornprismg 
,n (a) selectively isolating (taomatmftom a tiBSue; 

(b) ideutifyingpromoterregions4omthe.cim^«=rel,ou„dhyte 

transcriptional regulator; 

(c) identifying promoter regions from the chromatin that are bound by a 

member of the basal transcriptional machinery; and 
,5 Cd)comparingthepromoterregionsidentifiedinsteps(b)and(c)todetermme 

overlapping genes, 

wherein the cverbppir,g genes aretranacrrptionally active genes regulatedby 
the transcriptional regulator. 



20 
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Fig. 1 B 
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Fig. 1C 
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Fig. 2A 
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Fig. 2B 
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Fig. 3 



Disorder caused by Master 
Regulatory G^ne Dysfunction 



i 



Master Regulatory Gene (MRG) 



Identify Genes regulated by MRG or TR; 
Is MRG Broad or Narrow Acting? 



Broads 



MRG or TR 
is Target 
Gene For 
Therapeutic 



Narrow 



Is G^ne regulated by MRG 
or transcriptional 
Regulator (TR) a TR? 



Yes 




Transcriptional 
Regulator (TR) 
Regulated by 
MRG 



Is Gene Likely to be 
Causative in the 
Disorder? 

^ Yes 

Gene is Target 
Gene For 
Therapeutic 



(6/41) 



wo 2005/054461 



PCTrtJS2004/039805 



Fig. 4 
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Fig. 5 
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Fig. 6A 
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Fig. 6B 
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Fig. 6D 
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Fig. 7 




(13/41) 



wo 2005/054461 



PCT/US2004/039805 



Fig. 8 
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Fig. 9 
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Fig. 10 
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Name 


RefSeq 


Description 


Chaperons 






C4BPA 


NM_fllW715 


complement 4 binding preteii a 


APCS 


NM.001639 


amyloid P componenl 


F11 


NM_019559 


coagulation factor XI 


CIS 


NM.001734 


complement component Is 


VTN 


NM_00a838 


somstofTtedtn 6 


Enzyme-Hydrolese 




PGCP 




glutamate caiboxypepiidase 


61A 


Nlll.000169 


•galactosidase. alpha 


LIRA 


NM.000235 


lipase A 


SP011 


NWLI)12'M4 


brUu-IIKB 


PAFAH2 


NM_00(M37 


platetet-actlvating factor 2 


AADAC 


NW.0O1O86 


aiylacetanitds deacetytase 


PS-PU1 


NM.015900 


phosphoGpase Alalpha 


VNN3 


NM.D18399 


vanlnS 


CPB2 


NII/L01B413 


carboxypeptidase B2 


ANPEP 


NM_oai15D 


2ianyi oiTunopopuQaSc 


HGFAC 


NM.001S2B 


HSFacUvalor 


ENPEP 


NM.001977 


glutamyl emlncpeptldBse 


EnniriB-Lloase 




MCCC1 


NWL0201BB 


methylcfotonoyl-CoA caiboxylase 


GARS 


NM_002047 


glyoyWRNA synthetase 


TARS 


NM_0t»3191 


Uveonyl-IRNA synthetase 


Enzyme-Lyase 




UROD 


NM.0OO374 


uiaporphydnogen decaiboxylasa 


PCK1 


NM_002591 


PEPCK1 


HPCL2 


NM 012260 


2-hydroxyphytanoyl-CoA lyase 


HAL 


NM_0O21OB 


histidlne amrnonia-lyase 


FH 


Nli(L0OO)43 


fumarate hydratase 


Emyme-Oxldoreductase 


C0Q7 coenzyme Q. 7 


C0Q7 


NM_016138 


ADH4 


Nwumero 


alcohol dehydrogenase 4 


UQCRC2 


NM_003366 


ubig-cyl c reductase coie piot. li 


CYBWM 


NIVL030579 


cytochrome bS 


CYP2E 




qitochiome P4S0, HE 


CYB5 


NM_001914 


cytochiome b-5 


HSD17B2 


NMJ02153 


hydioxysterold dehydrogenase 2 


ADH1A 


NM_000667 


alcohol dehydrogenase 1A 


Enzyme-Transferase 




GCIMT3 


N(A.004751 


glucosaminyl transferase 3 


FNTB 




fatncsyltransfeiass beta 


tWMT 


NM.006895 


Mstamlne N-melhytiansfsiass 


G0T1 


NM.0D2O79 


aspartate amlnatransfetBse 1 


UGT2B15 


N1VL001076 


UDP glycosyltranslBrasa 2B15 


GBE1 


NIVLHOOISS 


glycogen branching enzyme 



Enzyme Regulator 

SERPING1 NM.(10Q062 Cl-lnhUtor 

SERPINA1 NM.000295 alpha-l-anlibypsin 

ITIH4 NMJI0221S Inter-alpha lnhlbilorH4 

AHSG NM.0Oie22 alpha-2-HSi|lycopioteln 

Ugand Binding 

TM0D2 NM.0MS4S tropamodurin 2 

IGFBP1 NIA.Q00596 IGF binding pnteln 1 

MTIX NM_I)D5952 melallothionBin IX 

CRP NII/L000567 C-reaclive protein 

AP0A2 NIVt_001643 apolipopcolein A-ll 
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RefSeq 


Description 


a. 

i 


tn 






Signal Transduction-Other 










V 


BIKE 


NM_017593 


BMP-2 inducible kinase 




*J 






S6K2 


NM.016Z76 


senim/glucocortlcold leg. kinase 2 










SEL1L 


NM.0a508S 


suppressor of lln-IZ^Ike 










SCYE1 


NlllL0(M757 


small cytokine El 






^ 




ANGPTL3 


N(iL014495 


angiopdetin-like 3 










Signal Transdudion-Reeeptor 










HAVCR-1 


NM.012206 


tiepatltls A virus cellular receptor 1 










TACR3 


Nlt4.aOia5S 


laiihykinin lecepkir 3 








. >/ 


GNB2L1 


NhLO06O98 


GTP binding pniaa . faela2L1 










ll<ISR 


Nt\^000208 


. Insulin receptor 






V 




SSTR1 


NIIL001049 


somatostatin receptor 1 






If 




THMSF4 


NM.004617 


transmemfaiane 4-4 


ff 








ASGR2 




asiatoglycopiDtein receptor 2 










GPR39 


MM.001508 


6 piDteti-caupled iscaptor 39 










1FNAR1 


NM.000629 


Interferon receptor 1 










TFRC 


NIVLa03234 


transfsnln receptor 










Tranacription Regulation 












ZNFSOO 


NM.052860 


krappeUke zinc Snger pretem 










BCLB 


NM 001706 


B-eell CLUymphama E 




*f 




v* 


2NF155 


NM_003445 


zinc flnger protein 1S5 










FBX08 


NM-012ieO 


F-box only protein 6 










NR0B2 


NIUL021969 


Small hcterodlmer protein 










HNF4a7 


AF5094S7 


Hl«iF4atpha, allematespHoe 










NR5A2 


N»fl_003822 


LRH-lffTZ-Fl 




tf 






ELF3 


NM_004433 


E74^D(e factor 3 




V 






NR1D1 


NM.021724 


THRA1 










ATF2 


NM.001880 


activating transcription factor 2 










CREBU 


NM.001310 


CREB-like2 










RARE 


NM_D16152 


RAR-bela 








«f 


' Transportsr-Channel/Pore 












SIJC17A2 


NML00S835 


vesicular glulamate transporter 










ACIP3 


NII/L0M92S 


atiuaporin 3 








*• 


SLC22A11 


NM.018484 


hOAT4 










GJB1 


NrA.000166 


gap junction protein, beta 1 










Tiansporter-Lipids and Small 


Molecules 










APOH 


NML000042 


BpoUpoproteln H 






If 




ALB 


NM.000477 


abutnin 










ABCC2 


NM.000392 


canalicular OAT 


wf 








GBPTI 


NM.0014B7 


glucose-6-phosphatase, transport 










Tianspoiter-Protelns 








V 




RAB6KIFL 


NMLa05733 


RAB6 inteiacBng, Unesln-like 




V 






PEX13 


NML002B18 


peroxisome bhigenesis factor 13 










TMP21 


NM_0D6827 


transmembrane liaffieking protdn 




«*• 


tf 




RAB33B 


NM_031296 


RAS oncogene 










NAPA 


NM.003827 


alpha SNAP 










AP3M1 


NhL012IBS 


adaptor-ielaled proL Complex 


V 








SNX17 




sorting nexin 17 
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Fig. 12 
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Fig. 13 



AADAC 
ABCC2 
ACF 
ADHIA 
ADH1B 
ADH6 
AGT 
AHSG 
AK2 

AKR1C2 
AKR1C3 
AKR1C4 
ALB 

ALDH3A2 

ALS2 

AMBP 

ANGPTL3 

ANPEP 

AP3M1 

APCS 

APG3 

AP0A2 

APOH 

AQP3 

AQP9 

ARHGAP11A 

ASGR1 

A86R2 

ATF2 

AUTL1 

BATS 

BIKE 

BTN2A1 

CIS 

C2 

C4BPA 
C86 
CCNE1 
CDCA1 

CISH 

CLYBL 

CNTNAP2 

CPB2 

CREBL2 

CRP 

CTSZ 

CYB5 

CYBS-M 

CYP2E 

CYP3A43 

DAF 

DC13 



J l' ^-'RefSeqi- ' j^^ ■^Namegi; - -RefSeq .i-*?^ 



NM_001086 
NM 000392 
NM 014576 
NM_000667 
NM 000668 
NM_000672 
NM_000029 
NM_001622 
NM_001625 
NM_001354 
NM 003739 
NM 001818 
NM_000477 
NM 000382 
NM_020919 
NM_001633 
NM_014495 
NM_001150 
NM_012095 
NM 001639 
NM_022488 
NM_001643 
NM_000042 
NM_004925 
NM_020980 
NM_0147B3 
NM_001671 
NM_001181 
NM 001880 
NM_0328S2 
NM_004639 
NM_017593 
NM_078476 
NM_001734 
NM 000063 
NM 000715 
NM_000066 
NM_001238 
NM 031423 
NM 013324 
NMJ382B0 
NM_014141 
NM_016413 
NM_001310 
NM_000567 
N1A.001336 
NM_001914 
NM_030S79 
NM_000773 
NM_022820 
NM_00057« 
NM_020188 



DKFZP564O0463 NM_014156 
DKFZP586A0522 NM_014033 
DKFZP5B6M0122 NM_015425 



DLEU1 
DUSP6 
EIF4SP2 
EU=3 
ENPEP 
F11 

FE65L2 
FH 

FKSG87 

FU10242 

FU10276 

FU10525 

FU10583 

FLJ10650 

FU10774 

FU11000 

FU1183B 

FU12788 

FU13448 

FLJ13611 

FU14356 

FLJ20080 

FU20718 

FU21272 

FU21934 

FIJ22551 

FU2325g 

FNTB 

G0S2 

G3A 

G5PT1 

GARS 

GBEl 

GCKR 

GD)2 

GIOT-2 

GJB1 

G0T1 

GPR39 

GPX2 

GRHPR 

GTF2B 

GTF2E1 

GTPB63 

HABP2 

HAL 

HA01 

HCAP-C 

HGD 

HGFAC 

HNF4A 

HNF4A 

HNF4a7 

HNMT 

,HPCL2 



NM_005a87 
NM-022652 
NM^a04O96 
NM_004433 
NM_001977 
NM_01955g 
NM_OO6051 
NM_000143 
NM_032029 
NM_018Q36 
NM_01B045 
NM_018126 
NM_018148 
NM.01B168 
Nl\4_024662 
NM_016295 
NM-024664 
NM_022492 
NM_025147 
NM_024941 
NM_030824 
NM_017657 
NM 017939 
NM_025032 
NII/L024743 
NM_024708 
NM_024727 
NM_002028 
NM_015714 
NM_019101 
NM 001467 
NmI002047 
NM_000158 
NM_001486 
NM_001494 
NM_016264 
NM_000166 
NM 002079 
NM_O015O8 
NM 002083 
NM_012203 
NM_001514 
NM_005513 
NM_032620 
NM_004132 
NM 002108 
NM_017545 
NM 022346 
NM_000187 
NM_001528 
NM_000457 
NM_000457 
AF5b9467 
NM_00B895 
NM 012260 



iNarne^.-itV ■ 
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,RefSeq is/i'.,,; .■.Name ' 


' RefSeq ' , V 


HPX 


NM 000613 


PHF2 


NM_005392 


ZNH288 


NM_015642 


HSD11B1 


NM 005525 


'1ST 


NmI020399 


ZNF361 


NM 01855S 


HSD17B2 


NM 002153 

l»iVI VWfa Ivw 


SLCBI 


NM_015192 






HSPC111 


NM_016391 


'LG 


NM_000301 






HSPC129 


NM_D16396 


'LGL 


NM 002665 






FNAR1 


NM_000629 


PS-PLA1 


NM_015900 






IGF1R 


NM 000875 


PZP 


NM_002864 






IGFBP1 


NM 000596 


RAB33B 


NM_031296 






INADL 


NM_005799 


RAMP 


NM_016448 






IT1H3 


NM 002217 


RARB 


NM_016152 






mH4 


NM_002218 


RflPS 


NmI031491 






1TM2B 


NM_021999 


RNGTT 


NM_003800 






KIAA0022 


NM_014880 


RPL37AP1 


ngZooosbb 






KIAA0689 


NM.014779 


SAC 


NM_018417 






KIAA0e44 


NM_014951 


SCYE1 


NM_004757 






KIAA0872 


NM.014940 


SEL1L 


NM_005065 






KIAA1041 


NM 014947 


SERPINA1 


NM_000295 






KNG 


NM 000893 


SERPINA10 


NmI016186 






LBP 


NMJ04139 


SERPINA6 


NmI001756 






LOC51060 


NM_015913 


SERPINC1 


NmIo004B6 






LOC5109B 


NM_016001 


SERPINE1 


NM 000602 






LOC51326 


NM 016632 




NM 000062 






LOC54518 


NM 019043 


SGK2 


NM 016276 






LOC56902 


NM.020143 


SLC17A2 


NNL005835 






LOC56486 NM.021211 


SLC22A11 


NM 018484 






LY6E 


NM_002346 


SLPI 


NM 003064 






M17S2 


NM 031856 


SNX17 


NM"o14748 






M96 


NM.00735B 


SRI 


NmIo03130 






MAGEA9 


NM 005365 


SSA2 


NM_004600 






MGC10500 NM_031477 


SSTR1 


NM_001049 






MGC11034 NM_031453 


SSTR4 


NM_001D52 






MGC11266 NM_024322 


STRAm 149S 


NM_021242 






MGC13010 NM.032687 


SUPVBLI 


NmI003171 






MGC1543S NM_032367 


SYNS 


NII/L133632 






MGC95S 


NM_024097 


TARS 


NM_003191 






MIA2 


NM.054a24 


TBPL1 


NM_004865 






MRPL15 


NM_014175 


TEF 


NM^003216 






MRPS18B 


NM_014046 


TFRC 


NM_003234 






MSH6 


NM.000179 


TIEG2 


NM.003597 






MT1H 


NWL005951 


TIEG2 


NmI003597 






MT1L, 


NM_002450 


TM4SF4 


NM_004617 






MT1X 


NM_005952 


TMEM1 


NM 003274 






MTHFD1 


NM_005955 


TNFRSF6 


NM_000043 






MTP 


NM_000253 


UGT1A1 


NM_000463 






NAPA 


NI\L003827 


UGTfflll 


NM^001073 






NET-2 


NM_01233S 


UQT2B1S 


NM_001076 






NFKBIB 


NM_002503 


UQCRC2 


NM 003366 






NPC1L1 


NM 013389 


VNN3 


NmI018399 






NR0B2 


NM 021969 


VTN 


NM_000538 






NR1D1 


NM 021724 


WBP4 


NM_007187 






NR5A2 


NM 0D3B22 


WDF2 


NM_052950 






NRD1 


NM_002525 


WDR12 


NM_01B256 






PAFAH2 


NM 000437 


XDH 


NM_000379 






PAX8 


NM_013952 


XPC 


NM_004628 






PCK1 


NM_002591 


ZK1 


NM_0DS81S 
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Fig. 14 



Jjaine ■ ^. ■.. 






;".RefSeq. ■ F- - i-^i 


AADAC 


NM_001086 


KIAA0101 


NM_014736 


ABCC9 


NM_020297 


KJAA0399 


NM_015113 


ADH4 


NM JB0670 


K)AA0844-- - - 


-NM_014951 


APOH 


NM_000042 


K1F13A 


. NM_022113 


ARHGAP11A 


NM_014783 


KIR-023GB 


NM_015868 


B29 


NM_031939 


KIR2DS2 


NM_012312 


BCL6 


NM_001706 


KIR3DL1 


NM_013289 


BIKE 


NM_017593 


KRTAP1.1 


NM_0309B7 


C4BPA 


NM_000715 


KRTHA3A 


NM_004138 


C6orf11 


NM_005452 


LlPA 


NM_000235 


CDC45L 


NM_003504 ■ 


LOC113201 


NM 138423 


C0L3A1 


NM_000090 


LOC1 13220 


NM_13B424 


C0Q7 


NM_016138 


LOC51092 


NM_015996 


CPXCR1 


NM_033048 


LOC5S906 


NM_020147 


CRH 


NM_000756 


MCCC1 


NM_020166 


CTSZ 


NM_001336 


MGC10500 


NM_031477 


CYB5-M 


NM 030579 


MGC15677 


NM_032878 


DKFZP564J157 


NM_018457 


MIA2 


NM_054024 


DLEU1 


NM_005887 


MRPL15 


NM_014175 


D0CK1 


NM_001380 


Nod1(-)6kb 


NM_006092 


DSC1 


NlVt_024421 


NPY2R 


NM_000910 


EIF3S6 


Nlvl_001568 


NR0B2 


NM 021959 


ELF3 


NM_004433 


NR2C2 


NM 003298 


FBX08 


•NM_012180 


NR5A2 


NM_003822 


FE65L2 


NM_006051 


PAFAH2 


NM 000437 


FILI(EPSILON) 


NM_014440 


PAX8 


NM_013952 


FU10242 


NM_018036 


pcnp 


NM 020357 


FU10252 


NM 018040 


PEX13 


NM 002618 


FU10474 


NM_018104 


PGCP 


NM 016134 


FU10650 


NM_018168 


PRO2032 


NM_018615 


FU11301 


NM_018385 


PSMA5 


NM_002790 


FU 13273 


NM_024751 


PS-PLA1 


NM_015900 


FLJ 13385 


NM_024853 


RAB33B . 


NM 031296 


FU 13448 


NM_025147 


RAB6KIFL 


NM_005733 


FLJ14855 


NM_033210 


SDGCAG10 


NM 005869 


FLJ20156 


NM_017691 


SEL1L 


NM_005065 


FLJ20225 


NM_019062 


SGK2 


NM_016276 


FLJ20234 


NM_017720 


SLG26A7 


NM_052832 


FU20298 


NW1_017752 


SP011 


NM_012444 


FU 20643 


NM_017916 


SRI 


NM_003130 


FU20731 


NM_017946 


SSTR1 


NM_001049 


FU21272 


N^4_025O32 


TACR3 


NM 001059 


FU2255g 


NM_024928 


TiW4SF4 


NM_004617 


FNTB 


NM_002028 


TM0D2 


NM_014548 


GCNT3 


NM_004751 


TMP21 


NM_006827 


GIOT-2 


NIVI_016264 


UQCRC2 


NM_003366 


GLA 


NM_00D169 


UROD 


NM_O0Q374 


GNB2L1 


NM 006098 


VNN3 


NM_018399 


GPR74 


NM_00488S 


WBP4 


NM_007187 


H4F2 


NM_003548 


ZNF155 


NM_003445 


HAVCR-1 


NM_012206 


ZNF300 


NM_052860 


HHLA2 


NM 007072 






HNF4a7 


AF509467 






FNA10 


NM_002171 






NSR 


NM_000208 
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Fig. 15A 
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-1 Oraianisni' 
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HNF4a 


TTR 
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human 
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Fig. 20B 
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Fig. 21A 
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Fig. 22A 
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Table S1 1 . The feed forward regulatory motife in 
pancreatic islets . The regulatory modules here were 
derived as described in Supporting Online Material. Feed 
forwards only involving HNF1a and HNF4a are also multi- 
input mottfe, as they bind each other's promoters in a 
multicomponent loop. 
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Fig. 23A 
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Fig. 23B 
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Fig. 24 
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